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Abstract — Approaching the 1.5329-dB shaping (granular) gain 
limit in mean-squared error (MSE) quantization of R" is im- 
portant in a number of problems, notably dirty-paper coding. 
For this purpose, we start with a binary low-density generator- 
matrix (LDGM) code, and construct the quantization codebook 
by periodically repeating its set of binary codewords, or them 
mapped to m-ary ones with Gray mapping. The quantization 
algorithm is based on belief propagation, and it uses a decimation 
procedure to do the guessing necessary for convergence. Using 
the results of a true typical decimator (TTD) as reference, it is 
shown that the asymptotic performance of the proposed quantizer 
can be characterized by certain monotonicity conditions on 
the code's fixed point properties, which can be analyzed with 
density evolution, and degree distribution optimization can be 
carried out accordingly. When the number of iterations is finite, 
the resulting loss is made amenable to analysis through the 
introduction of a recovery algorithm from "bad" guesses, and 
the results of such analysis enable further optimization of the 
pace of decimation and the degree distribution. Simulation results 
show that the proposed LDGM-based quantizer can achieve a 
shaping gain of 1.4906 dB, or 0.0423 dB from the limit, and 
significantly outperforms trellis-coded quantization (TCQ) at a 
similar computational complexity. 

Index Terms — granular gain, shaping, LDGM, source coding, 
decimation, belief propagation, density evolution, performance- 
complexity tradeoff 



I. Introduction 

THE mean-squared error (MSE) quantization problem of 
R n [2, Sec. II-C] can be formulated as followsQ let A be 
a discrete subset of W l (the quantization codebook, or simply 
codejQ and Q\ : M™ — » A be a quantizer that maps each 
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'Notational conventions: Z and M are respectively the set of integers 
and real numbers. ||-|| is the Euclidean norm. |„4| is the cardinality of 
set A. = denotes asymptotic equality, usually with respect to block length 
n — » oo. log(-), entropy and mutual information are computed in base-2, 
while ln(-) and exp(-) are base-e. Bold letters denote sequences or vectors 
whose elements are indicated by subscripts, e.g. y = (j/i, . . . , y n ), and 
sub-sequences are denoted by = (yi,y i+1 , . . . ,yj) or ys = (yi)ies- 
Addition and multiplication on sets apply element-by-element, e.g. U+2U 1 = 
{u + (2di , . . . , 2dn) | u G U,d{ € Z}. x mod [a, b) (or simply (x)[a t b)) 
is defined as the unique element of (x— (b— a)Z)n[a, b), and similarly x mod 
[a, b) n or (x)[ a b yi is the unique element of (x — (b — a)Z n ) n [a, b) n . The 
unit "b/s" means "bits per symbol". 

2 Ref. [2] assumes that A is a lattice, but in practice neither the trellis in 
TCQ nor the non-binary codebooks proposed here are lattices. Therefore, we 
allow A to be any discrete set, and definitions are modified accordingly. 



y 6 R™ to a nearby codeword Q\(y) G A. The mean-square 
quantization error, averaged over y, is given by 

<7 2 = limsup— \— ■- [ \\y-QA(y)\\ 2 dy. (1) 

The objective is to design A and a practical quantizer Qa(-) 
such that the scale-normalized MSE G(A) = a 2 p 2 l n is 
minimized^ where p is the codeword density 

p = lim sup — ±— | A n [-M, M] n | . (2) 

In this paper we consider asymptotically large dimension- 
ality n. By a volume argument, it is easy to find a lower 
bound G* = ■J^- for G(A). This bound can be approached 
by the nearest-neighbor quantizer with a suitable random 
codebook e.g. in [2], whose codewords' Voronoi regions are 
asymptotically spherical, but such a quantizer has exponential 
complexity in n and is thus impractical. The simplest scalar 
quantizer Ai = Z n , on the other hand, has the 1.5329-dB 
larger Gi = G(Ai) = yj, which corresponds to the well- 
known 1.53-dB loss of scalar quantization. In general, we call 
10 log 10 (G(A)/G*) the shaping loss of a quantizer, and it is 
also the gap of the granular gain and shaping gain defined 
in [3], for source and channel coding respectively, toward the 
1.53-dB limit. 

MSE quantizers with near-zero shaping losses are important 
in both source and channel coding. In lossy source coding, 
the shaping loss naturally dictates rate-distortion performance 
at high rates [3]. In channel coding on Gaussian channels, 
MSE quantizers can be used for shaping to make the channel 
input closer to the optimal Gaussian distribution [4]. Basi- 
cally, instead of transmitting the channel-coded and QAM- 
modulated signal u (each element of u corresponding to one 
symbol in the code block), we transmit x = u — a with 
a. = Qa(u) £ A, which should be closer to Gaussian, u 
and a are separated at the receiver side, and the shaping loss 
determines the achievable gap from channel capacity at high 
SNRs. Shaping is particularly important in dirty-paper coding 
(DPC) [5] on the channel 

y = x + s + z, (3) 

where x is the transmitted signal, s is the interference known 
only at the transmitter, and z is the "MMSE-adjusted" noise. 
Using an MSE quantizer, arbitrarily large s can be pre- 
cancelled without significantly increasing signal power by 
transmitting 

x = u — s — a, with a = Q\(u — s), (4) 

3 This agrees with the definition of G(A) for lattices in [2]. 
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so that the received signal 

y = u — a + z. (5) 

Again, the receiver must separate u and a, and the shaping loss 
determines the achievable gap from channel capacity. In this 
case, however, due to the lack of receiver-side knowledge of s, 
the rate loss caused by non-ideal shaping is most significant at 
low SNRs and can be a significant fraction of channel capacity 
[6]— [9]. For example, the shaping quantizer in [9] has 0.15dB 
shaping loss, corresponding to a rate loss of 0.025 b/s, yet in 
the 0.25-b/s DPC system this is already 10% of the rate and is 
responsible for 0.49 dB of its 0.83-dB gap from capacity. Apart 
from its obvious application in steganography [10], DPC and 
its extension to vector channels (similar in principle to vector 
precoding [11] but done in both time and spatial domains) are 
also essential in approaching the capacity of vector Gaussian 
broadcast channels such as MIMO downlink, therefore the 
design of near-ideal MSE quantizers is of great interest in 
these applications. 

Currently, near-optimal MSE quantizers usually employ 
trellis-coded quantization (TCQ) [12], in which A = U + 2Z n 
or U + 4Z n with U being respectively the codeword set of a 
binary convolution code or a 4-ary trellis code. The number 
of required trellis states increases very rapidly as the shaping 
gain approaches the 1.53-dB limit, and the computational 
complexity and memory requirement are thus very high. This 
is particularly bad at the receiver side of DPC systems, where 
the BCJR (Bahl-Cocke-Jelinek-Raviv) algorithm must be run 
many times on the trellis in an iterative fashion to separate u 
and a [9], resulting in a time complexity proportional to both 
the number of trellis states and the outer iteration count. 

Inspired by the effectiveness of Turbo and low-density 
parity-check (LDPC) codes in channel coding, it is natural 
to consider the use of sparse-graph codes in quantization. In 
[13] Turbo codes are used in quantization of uniform sources, 
but convergence issues make the scheme usable only for very 
small block sizes n, and the shaping loss is thus unsatisfactory. 
In [14]— [16], it is shown that low-density generator matrix 
(LDGM) codes, being the duals of LDPC codes, are good for 
lossy compression of binary sources, and practical quantiza- 
tion algorithms based on belief propagation (BP) and survey 
propagation (SP) have also been proposed in [17] and [18], but 
these works consider binary sources only. Practical algorithms 
for the MSE quantization of M™ with LDGM codes have not 
received much attention before. Even in the binary case, little 
has been done in the analysis of the BP quantizer's behavior 
and the optimization of the LDGM code for it. 

In [1], we have addressed the problem of MSE quantization 
using LDGM-based codes of structure A = U + mZ™, known 
as m-ary codes, where each u e U is a codeword of a 
binary LDGM code when m = 2, and is the combination 
of two codewords, each from a binary LDGM code, by Gray 
mapping when m = 4. The degree distributions of the codes 
are optimized under the erasure approximation, and shaping 
losses as low as 0.056 dB have been demonstrated. 

In this paper, we will improve upon the results in [1] by 
using better analytical techniques and more accurate methods 
for code optimization. We start in Section [TT] by analyzing 



the minimum shaping loss achievable by this m-ary structure 
using random-coding arguments. Although binary quantization 
codes have significant random-coding loss, they are analyzed 
first due to their simplicity. In Section [TTTJ we present the 
quantization algorithm for binary codes, which consists, like 
[18], of BP and a guessing ("decimation") procedure to aid 
convergence. 

Like LDPC, degree distribution plays an important role in 
the performance of LDGM quantization codes, but the use 
of decimation makes direct analysis difficult. To solve this 
problem, we propose the typical decimator (TD) as a subopti- 
mal but analytically more tractable version of the decimation 
algorithm, and analyze first its use in the simpler binary 
erasure quantization (BEQ) problem in Section IIVI which 
also forms the basis for the erasure approximation in [1]. We 
find that the TD can obtain asymptotically correct extrinsic 
information for decimation, and a solution to BEQ can be 
found with such information, as long as the code's extended 
BP (EBP) extrinsic information transfer (EXIT) curve [19] 
characterizing the fixed points of the BP process satisfies 
certain monotonicity conditions. For a given LDGM code, the 
most difficult BEQ problem it can solve is then parametrized 
by a monotonicity threshold I* hr , and the degree distribution 
can be optimized by maximizing this J' hr . 

In Section [V] these arguments are extended to our MSE 
quantization problem, and similar monotonicity conditions are 
obtained, which can be checked by quantized density evolution 
(DE). These DE results can be visualized with modified EXIT 
curves, and a similar method to the BEQ case can then be 
used for degree distribution optimization. 

We have assumed iteration counts L — > oo in the above 
analysis. In Section I VII we proceed to analyze the impact of 
finite L. We will show that a finite L causes "bad" guesses in 
decimation, and a recovery algorithm is sometimes required 
for BP to continue normally afterwards. With recovery, the 
loss due to finite L can be characterized by the delta-area A[ 
between the EBP curve and the actual trajectory, which will be 
used in the subsequent optimization of the pace of decimation 
as well as the degree distribution. 

All these results are extended to m-ary codes (where m = 
2 K ) in a straightforward manner in Section IVIII Numerical 
results on MSE performance in Section IVIIII shows that 
LDGM quantization codes optimized with the aforementioned 
methods have the expected good performance and can achieve 
shaping losses of 0.2676 dB at 99 iterations, 0.0741 dB at 1022 
and 0.0423 dB at 8356 iterations, the latter two of which are 
far better than what TCQ can reasonably offer and are also 
significantly better than the results in [1]. Indeed, a heuristic al 
analysis on the asymptotic loss-complexity tradeoff carried out 
in Section [IX] indicates that LDGM quantization codes can 
achieve the same shaping loss with far lower complexity than 
TCQ. We conclude the paper in Section |X] 

II. Performance Bounds of m-ARY Quantizers 

In this paper, we consider A with a periodic structure 
A = U + niL n , where U is a set of 2 nR codewords from 
{0,1, . ..,m — 1}™ with each u — u(b) £ U labeled by a 
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binary sequence b G {0, l} nR , We call A an m-ary rate- 
R quantization code. In this section, we will analyze the 
achievable shaping loss by this periodic structure. 

Given the source sequence y G W 1 , for each u = u(b) € U 
the nearest sequence to y in u + mZ n is x(b) = y — z(b), 
where z(b) = (y — tt(6))x« is the quantization error and 
X = [— -ttjt)- The quantizer has then to minimize ||z(b)|| 
over all b's, or equivalently, to maximize 



-t\\ z (b)\\ 2 



IF 

3=1 



(6) 



for some constant t > 0. The chosen b is denoted fo^, the 
corresponding quantization error is z y — z(b y ), and the 
resulting MSE (fl~|i then become^ 



1 

rri" 



1 



'V 



2 dy = — 

n ./[ ,i) 



n 7[0,m 

where (•} denotes averaging over a G {0, 1, . . . , m 



+ a 



1}". 



(7) 



A. Lower Bound of Quantization Error 

Given A, for each source sequence y € [0, m) n , let 

Qy= E iy^>- ^ 

6e{o,i}" H 



Since q y (b y ) < Q y , we can lower-bound the mean-square 
quantization error — ||%|| 2 as 



-WzA 2 



--^- t l nq y {b y ) > -—InQy. 



(9) 



Now let y = y + a with y G [0, 1)™ and a G {0, 1, . . . , m — 
1}™, and average over a, then from Jensen's inequality 

-(ll^+all 2 ) > --(lnQ y+a ) > --ln{Q y+a ), (10) 
n \ I nt nt 

where (Q y + a ) can easily be found to be 



(Qy+a) 



~tnB, 



HQy r/ with Qy = e- t(5+ < (11) 



3=1 



a 2 in (Q can be lower-bounded by integrating ( fTOb over y. 
For asymptotically large n, we only need to consider (strongly) 
typical y with respect to the uniform distribution on [0, 1), i.e. 
whose n elements are nearly uniformly distributed over [0, 1). 
We thus have 



> -—, I In (Qy+a) dy 
nt 7[o,i)" 



t 



'[04) 
In m — R In 2 



In Qy dy 



o 



(12) 
(13) 



4 For large n, jTJ is mostly just an average over strongly typical y with 
respect to the uniform distribution on [0, m), i.e. those whose elements are 
approximately uniformly distributed over [0, m), and the rest of this paper 
considers such y only. In shaping and DPC applications, y can be a modulated 
signal that does not follow the uniform distribution, and in such cases it may 
be necessary to "dither" y before quantization by adding to it a random 
sequence uniformly distributed in [0, m) n and known by the dequantizer, in 
order to obtain the expected MSE performance. 



This bound holds for any t > and is found to be tightest for 
t satisfying H t — logm — R (this t is hence denoted t (R)), 
when it becomes a 2 > P t . H t and P t are defined as 



H t = — J p z (z) logp z (z) dz, 
Pt= z 2 p z (z) dz, 



(14) 
(15) 



Pz(z) 



Si, y = z mod [0,1). (16) 



B. Achievable Quantization Error with Random Coding 

For asymptotically large n, we will see that the aforemen- 
tioned lower bound is actually achievable by random coding, 
that is, with the 2 nR codewords in U independently and 
uniformly sampled from {0,1,..., m — 1}™ (allowing for 
duplicates) and using the nearest-neighbor quantizer. 

Again we assume y G [0, m) n , and since the MSE — \\z y \\ 2 
is bounded for any y, we can consider only typical y's with 
respect to the uniform distribution on [0, m). Define 



{ue{0,...,m-l} n | \\{y-u) In \\ 2 <nP t } (17) 

as the set of possible codewords that are "sufficiently close" 
to y, and we can compute — log \U y \ with large deviation 
theory. If it is larger than log m — R, with asymptotically high 
probability UDU y ^ 0, thus some x G U + m1 n can be found 
whose MSE toward y is no more than P t . Since this is true 
for most typical y, the average MSE a 2 cannot exceed P t by 
more than a vanishingly small value. 

To compute ^\og\U y \ for a typical y, we define the 
type p y (u) of a sequence u as the fraction of each u G 
{0, 1, . . . , m — 1} at the positions in u whose corresponding 
elements in y are approximately y. Denoting the number of 
sequences u with this type as N[p y (u)], we have 

- log N\p y (u)] = - [ H y (u)dy, (18) 
n mj 

where H y (u) is the entropy 



Hy{u) 



m— 1 

E 

u=0 



P V (U) \OgPy(u) 



and u eW» becomes the constraint 




)\py{u) dy<P t . 



(19) 



(20) 



According to large deviation theory, -^\og\U y \ is asymptoti- 
cally the maximum of ( fT8l under the constraints ( f20b and 



P V (U) > 0, 



E 

u=0 



P„(u) = l, y€[0,m). (21) 



This is a convex functional optimization problem over p v (u) 
(a function of both y and u), which can be easily solved with 
Lagrange multipliers. The maximizing p y (u) is found to be 



p y (u) 



e -t{y-u) 2 2 

Qy 



y = y mod [0,1), 



(22) 
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and the resulting 



lOg \Uy\=H t 



(23) 



By the argument above, as long as Ht > log m — R, i.e. t < 
to(R), random coding can achieve a 2 < Pt for asymptotically 
large n. 

C. Marginal Distribution of Quantization Error 

From the p y (u) result in d22l . the marginal distribution of 
an individual Zj — (yj — uj)x under random coding can also 
be obtained as 

fm-i \ 

Py(u)8(z -(y- u)i) dy 



Vz(z) 



1 



™> JO 

m jo 

„-tz 2 
Qy 



\u=0 
fm-\ 

E 

\u=Q 

z el, 



Qy 



-S(z -(y- u)x) dy 



(24) 
(25) 
(26) 



where 5(-) is the Dirac delta function and y = z mod [0, 1) = 
y mod [0, 1). This is simply the p z (z) in ( fToT ), and H t in (fl4l 
and Pt in (15[ are respectively the entropy and average power 
of this distribution. 

D. The Random-Coding Loss 

We have shown that a random quantization codebook 
with the nearest-neighbor quantizer is asymptotically optimal 
among rate-P quantization codes of the form A = U + ml/ 1 . 
Therefore, its shaping loss represents the performance limit 
of such codes, and can be viewed as the cost incurred by the 
period-m structure. 

For asymptotically large n, the random m-ary quantizer 
has average MSE a 2 = Pt with t = to(R) and density 
p = 2 nR /m n , so the achieved G(A) = a 2 p 2 l n = P t (2 R /m) 2 . 
The shaping loss 10 log 10 (G(A)/G*) can then be expressed 
as 101og 10 (P t /P t *), where P t * = ^(m/2 R ) 2 is the power 
of a Gaussian with entropy H t = log m — R. We called it the 
random-coding loss, and it is plotted in Fig. Q] for m = 2 and 
in = 4. For large m and moderate R, Qy in ( fTTT i approaches 
a constant, p z (z) is close to a Gaussian distribution, thus 
P t « P* and the random-coding loss is close to zero. 

III. The Binary LDGM Quantizer 

As random quantization codes with the nearest-neighbor 
quantizer are obviously impractical to implement, it is natural 
to look into sparse-graph codes as practical candidates for 
achieving near-zero shaping losses. In [14], it has been shown 
that LDPC codes are unsuitable for BEQ but LDGM codes 
work well, therefore we will also use LDGM codes in MSE 
quantization. We consider the simplest m = 2 case first, and 
in Section IVHI we will look into codes with larger m that are 
not as limited by the random-coding loss. 

We thus consider A = U 4- 2Z™ with U being the codeword 
set of an LDGM code, i.e. each u e U is of the form u = 
c = bG, where b e {0, l}" b , n\, = nR and the low-density 
generator matrix G = (ffij)n b xn is randomly generated from 
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(a) binary code (m = 2) 
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= 

to 
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(b) 4-ary code (m = 4) 



Fig. 1. Random-coding losses of binary and 4-ary quantization codes. For 
binary quantization codes, the minimum loss is approximately 0.0945 dB at 
t = 3.7 and R = 0.4130 b/s. For 4-ary codes, the minimum loss is only 
O.OOlOdB at approximately t = 2 and R = 0.9531 b/s. 



some degree distribution that will be optimized below. Given 
such a code, q y (b) in d6) can be represented by the factor 
graph [20] in Fig. |2(a) 3 The c-nodes (shorthand for the factor 
nodes Cj, j = 1, . . . ,n) represent the relationship c = bG, 
whereas each factor e _t ^ J_Ci ^ x in © is included in the prior 
on variable Cj as 



A i( c )=nr e 



-t(Vj 



Pz((Vj - c)i), 



(27) 



where Qy j (with yj = yj mod [0, 1)) serves as the normaliza- 
tion factor. 

The belief propagation algorithm (also known as the sum- 
product algorithm) can then be run on this factor graph. Unlike 
the case of LDPC decoding, here BP does not usually converge 
by itself@ Instead, we rely on BP to generate "extrinsic 
probabilities" v° for each bi after a number of iterations, with 
which hard decisions are made on some foj's (called decimation 
following [17]). Subsequent BP iterations use these hard 
decisions as priors A^, and the resulting updated v^'s are used 
for more decimations. This iterative process continues until 
a definite b is obtained that hopefully has a large q y (b) and 
thus a small quantization error. This quantization algorithm is 
shown in Fig. [3] with a BP part and a decimation part in each 
iteration. As is intuitively reasonable, each time we decimate 



In the factor graph, symbols such as and Cj denote variable and factor 
nodes, while bi and Cj are the variables themselves. A ^ = denote the 
set of indices i for which there is an edge connecting b; and cj. In belief 
propagation, is the priors on variable bi, v° is the computed extrinsic 
probabilities for bi, jit9 denotes a message from node bi to cj, and so on. 
The priors, posteriors and messages are all probability distributions [20], in 
this case over {0, 1}, and here we represent them by probability tuples (rather 
than L-values, which are equivalent). For example, is viewed as a tuple 
(A s b (0), A£(l)) satisfying A£(0) + A t b (l) = 1 (the normalization is done 
implicitly), which corresponds to L-value ln(A b (0)/A b (l)). "0" and "©" 
refer to the variable-node and check-node operations in LDPC literature, i.e. 
(Mo!Mi)©(Mo>A 1 'i) = (Mo^O'^i^i) (implicitly normalized) and (hq,^)® 
Kp/4) = (MqMo + MiMi'MoK +MiM )- = C 1 ' )' 1 = (M) and 
* = (2' are res P ec ti ver y me "sure-0", "sure-1" and "unknown" messages. 
H(fj,) = — /Ltrj log no — fii log fii is the entropy function for ji = (jio, fii). 

intuitively speaking, when doing LDPC decoding with SNR higher than 
threshold, the transmitted codeword is usually much closer to the received 
sequence (and thus much more likely) than any other codeword, allowing BP 
to converge it. In the case of quantization with LDGM codes, there are usually 
a large number of similarly close codewords to the source sequence, and BP 
cannot by itself make a decision among them. 
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(a) original form (b) perturbed form 



Fig. 2. The factor graph of the binary LDGM quantizer. Circles are variable 
nodes and black squares are factor nodes. The gray area contains random 
b-to-c edges, and each edge from hi to Cj corresponds to gij = 1 in the 
generator matrix G. We mostly use the original form (a) with the priors 
A^(c) = p z ((yj — c)x). The equivalent "perturbed" form (b) is used in 
Section |23 where y = y mod [0, l) n , a = (y - y) mod 2 £ {0, l} n , 
and u = c = (c — a) mod 2. With y fixed, each prior A? is a hard decision 
clJ. Since z = (y—c)x = (y—E)x, the prior on Cj is A^ (c) = Pz((ijj — &)z), 

the "most certain" bit b^, with 

i* = argmax max ^, b (&), (28) 

and it is decimated to its most likely value 

b* = axgmaxz/!l(6). (29) 
be{o,i} 

This is called the greedy decimator (GD). Alternatively, for 
the convenience of analysis we will also look at the typical 
decimator, implementable but with worse performance in 
practice, in which the bit index i* to decimate is chosen 
randomly in £ (the set of yet undecimated bits) with equal 
probabilities, and its decimated value b* is b G {0,1} with 
probability v°,(b). 

The number of bits to decimate is controlled through the 
estimated mutual information in b-to-c messages (i.e. 
the Hij's), which is made to increase by about AI^ m in 
each iteration. This amount of increase A/f" 11 , possibly a 
function of the current /be an d hence called the pace of 
decimation, makes the algorithm terminate within L iterations 
if followed exactly, though the actual iteration count L can be 
somewhat different. Uniform pacing is used in [1], i.e. A/ b " m 
is a constant 1/Lo- in this paper, the pacing is optimized in 
Section IVI-DI to obtain somewhat better MSE performance. 
Increasing Lq also improves MSE performance, but more 
iterations would be necessary. 

The decimation algorithm can either be unthrottled or throt- 
tled. The unthrottled version used in most of our simulations 
simply decimates until the increase of in the iteration 
reaches A/ b " m . In the throttled version introduced in [1], the 
amount of decimation per iteration is instead controlled by 
<5 m ax, which is smoothly adapted, as shown in Fig. [3] to make 
/be increase eventually at the desired pace. 

More will be said on the decimation algorithm in Sec- 
tion [VI] but we will first discuss the optimization of LDGM's 



A£(c) <= MiVi - c )i). 3 = 1, • • ■ , n, c = 0, 1 {1 = [-1, 1)} 
Mi| *> * = 1. • • • i «b> j = 1) • • • : n 
Aj? •<= *, i = 1, ...,n b 

5 •<= {1, 2, . . . , rib} {the set of bits not yet decimated} 

<5max <= 0, I hc <J= 

repeat {belief propagation iteration} 

for j = 1 to n do {BP computation at Cj } 

end for 

for t = 1 to Hi, do {BP computation at b^} 
end for 

I be <= 1 - ( n b4) _1 H (^ij) {estimate new / bc } 

8 <= {amount of decimation so far in this iteration} 
Set A/™ m according to the desired pace (e.g. to UOlt ) 
if Jj~ < /be + A-ffcc" 1 t nen {little progress, do decimation} 
repeat 

i* <= argmax ig £ max;, u°(b) {6j» is the most certain bit. . . } 
b* argmaXfjg/Q x j vr, (b) {...whose likely value is 6*} 
S <=S+ (-log 4. '(&*)) 

/+ <= + K^r 1 Hoftj 

A t b , <^ 6*, fj^ <=6«Jg AT b t c {decimate 6,; to b*} 

e^= e\{i*} J 

until 8 > 5 max or 1+ > I bc + A/™' 1 or £ = 
end if 

5max rnax(0.8<5 m axi 

1.255) 

until 5 = 

&i <^ (resp. 1) if A^ = (or 1), i = 1, . . . , n b 
c bG, ii -4= c 

"j = (Vi ~ c j)l x i = Vj - Zj, j = l,...,n 

Fig. 3. The binary quantization algorithm. The throttled version is shown 
above, while the unthrottled version is without the 8 > <5 max condition in the 
until statement. The choice of i* and b* corresponds to the greedy decimator. 



degree distribution and the choice of t in Sections [IV] and [V] 

IV. Degree Distribution Optimization for Binary 
Erasure Quantization 

Like LDPC codes, LDGM quantization codes require op- 
timized degree distributions for good MSE performance. The 
performance of LDGM quantizers has been analyzed previ- 
ously in [15] for binary sources, but this analysis, based on 
codeword-counting arguments, is applicable only to nearest- 
neighbor quantization and not very useful for the above BP 
quantizer. In [17]'s treatment of LDGM quantization of binary 
sources, degree distributions of good LDPC codes in [21] are 
used directly, inspired by the duality between source and chan- 
nel coding in the erasure case [14]. In our previous work [1], 
LDGM degree distributions are instead designed by directly 
fitting the EXIT curves under the erasure approximation (EA), 
also known as the BEC (binary erasure channel) approximation 
[22]. Both methods perform well, but they are heuristic in their 
analysis of decimation, and may thus be suboptimal. 

In this and the next section, we will give a detailed anal- 
ysis on degree distribution optimization of BP-based LDGM 
quantizers that properly takes decimation into account, which 
should allow better MSE performance to be attained. Under the 
erasure approximation, we are in effect designing an LDGM 
quantization code for the simpler binary erasure quantization 
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problem and using it in MSE quantization^] Therefore, we 
will first focus on BEQ in this section, and in Section [V] the 
methods given here will be extended to MSE quantization, 
with or without the erasure approximation. 

A. Binary Erasure Quantization 

The binary erasure quantization problem can be formulated 
as follows [14]. The source sequence has the form y G 
{0, 1, *}", where "*" denotes erased positions and occurs with 
probability e. A binary code U consisting of 2 nR codewords 
u = u(b) G {0,1}", each labeled by b G {0, l} nR , is 
then designed according to e and the rate R. For each y, the 
quantizer should find a codeword u G U such that yj = Uj 
or yj — * for all j — 1, . . . ,n, i.e. u agrees with y on all 
non-erased positions. The number of non-erased positions in 
a given y is denoted by n nc , which is approximately n(l — e) 
for large n. Ideally rib = nR can be as small as this n(l — e), 
i.e. R = 1 — e, but in practice higher rates are necessary. 

Similar to ©, q y (b) can be defined as 



iy( b ) = II </., (%(&)), 



II Vj =U j0 [*, 

vA ]> ] otherwise, 



(30) 



and the quantizer can equivalently find, for a given y, some b 
such that > (which then equals 1). 

When U is the codeword set of an LDGM code and u = 
c = bG as in Section HIT1 q y (b) can be described by the factor 
graph in Fig. |2(a)| as well, where each A^(c) is a normalized 
version of q yj (c), i.e. A^ is 0, 1 or * if yj is respectively 0, 
1, *. Apart from this difference in A^, the algorithm in Fig. [3] 
with the typical decimator can be used here for the purpose of 
analysis, though the recovery algorithm in Section IVI-BI will 
be necessary for good performance in practice. 

The BEQ problem may alternatively be viewed as a set of 
linear equations 

bG nc = y nc (31) 

over the binary field GF(2) = {0, 1}, where G nc and y nc are 
the n nc columns of G and y that correspond to non-erased 
positions of y. Denoting by n r the rank of G uc , (l3TT l then has 
2" b ~" r solutions for 2" 1 of the 2™ no possible y nc 's, and for 
other y nc 's there is no solution at all. 

We first assume that ( t3TT > has a set B of 2" b_ " r solutions, 
then Py (b) = 2-^-' n ^q y (b) is a probability distribution for 
b that is uniform over B. Using this p y (b), similar to the BP- 
derived extrinsics v°, the true extrinsic probabilities v°* of bi 
can now be defined as 



Vi*Q>) =Py(bi = b\br\ {i} = b*^ {i} ), i 



1. 



(32) 

which depends on the set T of decimated bits and their 
decimated values b*jr. Note that v°* can only be 0, 1, or *: it is 

V\{i} 



b if all solutions with = &jr\|j> haw bi = b G {0, 1}, 



7 In this paper we only consider codes chosen randomly, through random 
edge assignment, from the LDGM code ensemble with a given degree 
distribution, therefore only the degree distribution is subjected to optimization, 
and we will not distinguish between codes and degree distributions. 



and otherwise there must be the same number of solutions 
with bi = and with i»j = 1, making v°* = *. 

Without loss of generality, the typical decimator can be 
assumed to decimate in the order of b\ , &2 > • • • j b nh . Decom- 
posing p y (b) into 

Pv(b) = Py(bi) Py (b2 1 fox) • ■ -Py(bn h I b^ 1 ), (33) 

each factor p y (bi \ b 1 ^ 1 ) is then the v°* after the decimation 
of b 1 ^ 1 into '*. We therefore construct the fictitious true 
typical decimator (TTD), which is just like the TD except 
that decimation of bi is done according to v°* rather than v°. 
Moreover, the TTD shares the source of randomness with the 
TD, so decimation is still done in the order of b\, ... , 6„ b , and 
each bi is decimated to the same value except to account for 
the difference between v° and ^ b *HThe TTD in effect samples 
a b* according to the probability distribution p v (b), so it must 
yield a random solution b* G B. If, for every i — 1, . . . , rib, 
the TD at the time of foj's decimation has v° = v°* ', then it will 
run synchronously with the TTD and yield the same solution 
in B. Otherwise, e.g. if v° = * and v°* = for some i, then 
the TD might decimate 6j to 1, which will eventually result 
in a contradiction. Therefore, our first requirement for TD to 
find a solution to (l3TT l is that BP must compute the correct 
extrinsic probabilities after enough iterations, which is hence 
called the extrinsic probability condition. 

How, then, to ensure the existence of solutions to (|3TT > for 
an y 2/nc? We may define Q y with ^ which, for each y ne , 
gives the number of solutions to (l3TT l and is 2 nb-,lr for 2™ r 
y nc 's and zero for the rest. Q y , if normalized by 2~™ b , is 
again a uniform distribution over these 2" r y nc 's. We then 
require n r = n nc , making Q y a uniform distribution over all 
2«no possible y no 's, so that the BEQ problem have 2" b ~" no 
solutions for any y nc . This is the other condition for BEQ to 
be always solvable by the TD, hence called the equi-partition 
condition. 

For n — > oo, the two conditions above are now suitable for 
analysis with density evolution methods, which in the BEQ 
case can be accurately done with EXIT charts, as will be 
discussed in the following subsections. 



B. Fixed Points and EXIT Curves 

We use b-regular, c-irregular LDGM codes for quantization 
as suggested by the LDGM-LDPC duality in [14]. Let d b be 
the right-degree of all b-nodes, and denote w cc i as the fraction 
of c-nodes with left-degree d and w C( j = dw c a/ (Rd^) as the 
corresponding fraction of edges. 

Assuming that the BEQ problem does have solutions for 
the given y, with the one found by TTD denoted b* and 
u* = c* = b*G. Assuming additionally that our quantizer 
based on TD has decimated a fraction lb of the b-nodes 
and has so far maintained synchronization with the TTD in 
decimation decisions, b* is then consistent with the current 

8 For example, the TD and the TTD can use the same i.i.d. random sequence 
ti,t2, . . . , T„ b in decimation with each Tj uniformly distributed in [0, 1), 
and each bi is decimated to in the TD if Tj < f^(0) and in the TTD 
if Ti < f^*(0), and to 1 otherwise. In this way, the decimation results are 
always the same if v. = v , and are rarely different if v° and u b * are close. 
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h (bit) 7 b (bit) h (bit) 

(a) EBP curves of (4, 2) regular LDGM code (b) EBP curves of (5, 3) regular LDGM code (c) Comparison of EBP, BP and MAP 



Fig. 4. The EBP curves of some (d^, d c ) regular LDGM codes, in which all b-nodes have right-degree and all c-nodes have left-degree d c . (a) The (4, 2) 
regular code has rate R = 0.5 and monotonicity threshold 7* hr = 1/3. For 7* hr < I c < R, part of the EBP curve lies in the 7^ < half-plane, although ij, 
is monotonically increasing once it becomes positive. This implies a violation of the equi-partition condition. For 7 C < 7^ hr , the monotonicity conditions are 
satisfied, (b) The (5, 3) regular code has rate R = 0.6 and monotonicity threshold 7^ hr = 7/16 = 0.4375. When 7 C is reduced to 0.5176, the EBP curve no 
longer extends into the 1^, < half-plane, but it is still not monotonic until 7 C is further reduced to . (c) A comparison of the EBP, BP and MAP curves 
of the (5, 3) regular code at 7 C = 0.5, assuming that the results in [19] remain true. The area Ai to the right of the MAP curve represents the bi's whose 
l/. = b* but v° = * and thus violate the extrinsic probability condition. That is, the values of these bits are determined by previous decimation results but 
not available from BP at the time; they are apparently "guesses" until they are "confirmed" by an equal number of equations encountered later represented 
by A2. Here A 2 = Ai, which intuitively means that all confirmations constrain earlier guesses rather than y n c, so the equi-partition condition is satisfied. 
This is not the case for e.g. the (4, 2) regular code at 7 C = 0.5 in (a): there the MAP and the BP curves overlap with the EBP curve in the 7^ > half-plane 
but does not extend to the left, and the area between the EBP curve and the 7b = axis represent "confirmations" that, having no earlier guesses, must be 
satisfied by j/ nc , therefore the equi-partition condition is not satisfied. 



priors and can serve as the reference codeword: all /7^ c and 
fijl' s, with bi decimated or not, must be either b* or * and 
never contradict the reference codeword. Denoting by e.g. 
the average mutual information (MI) in the /i b j's from the 
previous iteration about their respective reference values b*, 
which in this case is simply the fraction of /ijy that equals 
6*0 and using the usual fact that the factor graph becomes 
locally tree-like with high probability as n — > 00, we can find 
the EXIT curve relating the input /be for the c-nodes and their 
output 7 c b, hence called the c-curve, to be 

/cb = 4^«cd/ b i c" 1 » (34) 

d 

where I c is the MI of the A^'s, in this case 1 — e. The b-curve 
relating I c t, and the output It, c from the b-nodes (denoted by 
I^c as it refers to the next iteration) is likewise 

/ b + c = l-(l-/b)(l-/cb) db " 1 . (35) 

To analyze the extrinsic probability condition, it is necessary 
to look into the behavior of BP's fixed points, which are 
characterized by the EBP EXIT curve first proposed in [19] 
for LDPC decoding over BEC. The EBP curve relates the a 
priori MI It, at fixed points (i.e. the It, making l£ c = 7 b c), 

/b = l-(l-/bc)/(l-/cb) db ~\ (36) 

and the extrinsic MI in the i/ b 's, i.e. the fraction of v° that are 
b* rather than *, 

/b,cxt = 1 - (1 - /cb) db , (37) 

as /be goes from to 1 and J cb given by d34l l. Fig. [4] shows 
the EBP curves of some codes for example. Note that Ib.ext 

'in this paper, all such Mis and EXIT curves are also averaged over the 
LDGM code ensemble with the given degree distribution. Assuming that 
relevant concentration results hold, for n — ► 00 we can also talk about the 
convergence behavior of a specific code using these ensemble-averaged Mis. 



is always non-negative and monotonically increasing with I^ c , 
but It, in d36t is not necessarily so. 

Every crossing the EBP curve makes with a constant-/ b 
vertical line corresponds to a fixed point of BP at this It,, and 
when the number of iterations L — ► 00, it is clear that BP will 
follow the minimum- /b.cxt fixed point as J b goes from to 
1, forming the BP EXIT curve in [19]. The MAP (maximum 
a posteriori probability) EXIT curve in [19, Definition 2] is 
simply the relationship between the fraction It, of decimated 
bits and the average true extrinsic MI in the f b *'s, as is evident 
from [19, Theorem 2], where the random vector b (currently 
taking value b*) is the X in [19], the b-priors A b are the 
BEC output Y, and the c-priors (or y) are the additional 
observation f2. 

Interestingly, our BEQ problem is now very similar to 
the LDPC decoding problem on BEC considered in [19], as 
both involve a system of linear equations over GF(2) that 
has at least one solution (b* for LDGM-based BEQ and the 
transmitted codeword for LDPC-over-BEC) consistent with 
all previous guesses0 In particular, the area above the MAP 
curve is H(b \ y)/n\ > [19, Theorem 1], with H(b \ y) being the 
entropy of the aforementioned p y (b); under the equi-partition 
condition (ED should have 2" b ~"" c = 2 nh{1 ~ h / R) solutions, 
so this area is 1 — I c /R, and the area below the MAP curve 
is I c /R, while if the equi-partition condition is violated (|3TT > 
will have more solutions for the current y (and none for many 
other y's), and the MAP curve will have a smaller area below 
it. On the other hand, the area below the EBP curve can be 
computed directly from (f36t and d37k this area is also I c /R 
if v c \ = 0, and when v c i > it is defined as the total gray 
area in Fig. [5] which is smaller than but close to I c /R. 

10 The only difference is that the number of equations in BEQ, n no , is 
random whereas in LDPC decoding over BEC it is always the number of 
check nodes. This should not be essential though. 
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Fig. 5. The area under the EBP curve (the thick solid curve) when v c i > 0. 
In such cases the EBP curve does not start from (0, 0), and we define the 
area below it as the total area of the two gray regions, whose respective areas 
are shown in the figure. Note that the lower area 1 — (1 — I c v c i) b is smaller 
than dt,I c v c i, so the total area is smaller than I c /R, but is very close to it 
in the codes we will encounter since d^I c v Q \ is at most 0.03 or so. 



If the results in [19] on the relationship between MAP, BP 
and EBP curves remain true, these three curves should be given 
by Fig. |4(c)| Heuristic arguments below the figure suggest that 
the extrinsic probability and equi-partition conditions above 
for the TD to solve the BEQ problem are satisfied, with a 
vanishing fraction of exceptions as n — > oo, if and only if the 
EBP curve satisfies the following monotonicity conditions?^ 



> 0, xG[0,l], 



(38) 
(39) 



JbU=o > o, 
dx 

where I b is viewed as a function (|36*1 i of x = Ibc- We now 
prove this using similar methods to [19]. 

Necessity. The extrinsic probability condition means that 
v° = v°* for all but a vanishing fraction of % e {1, . . . , rib} at 
any 1^ after enough iterations, which implies that the two have 
at least the same average MI, i.e. the BP curve coincides with 
the MAP curve, the area below which is in turn I c /R under 
the equi-partition condition. Since the BP curve follows the 
minimum fixed points on the EBP curve, and the area under 
the latter is at most I c /R, the two curves must coincide as 
well, which immediately leads to (l38l and ( 13 91 , 

Sufficiency. Under d38l and (1391 . the BP curve obviously 
coincides with the EBP curve, and since (l38l implies v c \ = 0, 
the area below them is I c /R. BP can never give any infor- 
mation not implied by y and previous decimation results, i.e. 
for any i we have either v° = v°* or v° = *, so the MAP 
curve cannot lie below the BP curve and the area below it is 
at least I c /R. We have also shown that the area below the 
MAP curve is at most I c /R, therefore equality must hold and 
the equi-partition condition is satisfied. Now that the MAP 
and BP curves also coincide, for any It, the i^ b 's will have the 
nearly the same average MI as the i^ b *'s (with the difference 
vanishing after many iterations when n — > oo), and since any 
v\ ^ v°* implies v° — * and v°* — b* and thus a difference 
in MI, it can only occur for a vanishingly small fraction of 
i's. Therefore the extrinsic probability condition also holds. □ 

1 1 Note that this has nothing to do with monotonicity with respect to a class 
of channels, which appears often in LDPC literature [23]. 



We will see below that the monotonicity conditions are more 
easily satisfied for smaller I c , so for a given code, we can 
define the maximum I c that satisfies them as the monotonicity 
threshold, denoted by /' hr . This is the maximum (1 — e) for 
which the BEQ problem can, in an asymptotic sense, be solved 
by the TD. The same performance is expected for the greedy 
decimator, since in BEQ it is basically identical to TD. 

It should be noted that the monotonicity conditions are suffi- 
cient for the extrinsic probability and equi-partition conditions 
only in the sense that the fraction of violations approaches 
zero as the block size n and the iteration count L go to 
infinity. Therefore, in practice some contradictions will occur 
in the TD, and some equations in OTb will be unsatisfied. In 
Section IVI-BI we will propose a method to deal with such 
contradictions, such that the number of unsatisfied equations 
remains a vanishing fraction of n. 

C. Optimization of the Monotonicity Threshold 

We can now optimize the degree distribution so that J* hr is 
maximized and approaches its ideal value R. 

From (l36*b and ( [34-b . it is easy to show that the condition 
(l38l is equivalent to v c i = 0, i.e. there are no degree- 1 c- 
nodes. As for the second condition d39l ). differentiating (f36b 
with respect to x = gives (hence we denote y = 1 — J C b) 

^ = y- db (y-ic- (d b - i)(i - x) J2(d - i)v cd x d - 2 ^j . 



Making d40b nonnegative, we get 

1 



Ic < 



s(x)' 



e [0, l] 



where 

s(x) =Y J VcdX d - 1 + {d h -l){l-x)Y,(d-l)v cd x d - 2 . (42) 

d d 

Therefore, the monotonicity threshold is 



^ thr = ( max six) 
ixe[o,i] 



(43) 



and it can be maximized by solving the following optimization 
problem over s max = l/i^ hr and v c d, d = 2, 3, . . .: 

minimize s max 

Subject tO s(x) < Smax, e [0, 1], 

E 1 >r^ Vcd 1 (44) 

d d 

v C d > o, yd. 



Rd b 



In practice, the s(x) < s max constraint is applied to a number 
of discrete x's (1000 values uniformly spaced over [0, 1] seem 
to suffice), and the set of c-degrees is chosen to be the 
exponential-like sequence 

V = {d k \ k = l,2,...,\V\, dt=2, d k+1 = \0-d k ]}, 

(45) 

where we set (3 — 1.1, and \T>\ is made large enough not to 
affect the final result. Since s(x) is linear in v c d, d44i i then 
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TABLE I 

Impact of d b in BEQ (R = 0.4461 b/s) 



4 


6 


7 


8 


9 


10 


11 


jthi 


0.4110 


0.4294 


0.4376 


0.4416 


0.4437 


0.4448 


d max 


6 


10 


19 


37 


70 


127 



becomes a linear programming problem that is easily solved 
using usual numerical methods. 

In Table H] we list the optimal 7* hr achieved at different 
values of db as well as the resulting maximum c-degree d™ ax . 
We see that approaches its ideal value R exponentially fast 
with the increase of db, but the necessary rf™ ax also increases 
exponentially. Due to the problem's simplicity, it is probably 
not difficult to prove this. 

V. Degree Distribution Optimization for MSE 
Quantization 

It is well known that long LDPC channel codes can be 
effectively analyzed and designed using density evolution 
methods, not only over BEC but also over general binary- 
input symmetric channels [21]. Such methods are also useful 
for LDGM quantization codes, but their application is not as 
straightforward as the LDPC case due to the stateful nature of 
decimation, its use of extrinsic probabilities (which is available 
in DE only for the final iteration, at the root node of the 
tree-like neighborhood), and the lack of a "natural" reference 
codeword in quantization as is available in channel decoding. 

In Section IIV1 we have solved these problems in the BEQ 
case by introducing the TTD: the result of TTD is used as the 
reference codeword, with which decimation can be modeled 
by the priors A^ with a single parameter lb, and the extrinsic 
probabilities at each decimation step can be analyzed sepa- 
rately. In this section, we will extend this TTD-based method 
to MSE quantization so that code optimization can likewise 
be carried out with DE. When the erasure approximation is 
used in DE, we obtain the same optimized degree distributions 
for BEQ, but we can also avoid EA and do a more accurate 
optimization using the quantized DE method a la [21], [24]. 

A. Density Evolution in MSE Quantization 

Without loss of generality, suppose the source sequence y £ 
[0, to)™, which can, as in Section [II] be decomposed into y = 
y + a, where y <E [0, 1)™ is assumed to be typical with respect 
to the uniform distribution over [0, 1), and a £ {0, 1, . . . , m — 
1}™. For a fixed y, we may define, similar to ©, 

q(b;a) = q y+a (b) = e -*ll f , (46) 

which can be regarded as a probability distribution over b 
and a after normalization. With Qy+ a defined in ([8]), this 
distribution can be decomposed into 

q(b; a) = Q s - p(b; a) = Q s ■ P(a) Pa (b) , (47) 



where 



P(a) = 



Qi 



Pa(b) = 



q(b; a) 

Qy-\-a 



are respectively probability distributions over a and over b 
conditioned on a, and 



E 



Qy-\ 



m n (Qy + a) 



3 = 1 



llj 



= 2™ b exp [n / In Q y dy 



(49) 



(50) 



using ( fTTT i and the typicality of y. 

The quantization of y = y + a is equivalent to finding a b 
for a given a that (approximately) maximizes q(b; a). Again, 
we consider the typical decimator since the greedy decimator 
is difficult to analyze, and the order of decimation is assumed 
to be bi , &2j • • • , b nb without loss of generality. With the true 
extrinsic probabilities v°* of bi defined like d32b according to 
p a (b), the decomposition 

Pa{b) = Pa(&lK(&2 | h) ■ ■ - Pa (b nb | 6J*- 1 ) (51) 

again has each factor p a (bi — b \ bj -1 = b^ -1 '*) equaling the 
Vi*(b) when previous b 1 ^ 1 has been decimated into b l {~ '*. 
The TTD is then the decimator similar to TD but using v\* 
instead of v°, so it yields decimation result b* with probability 
p a (b*), and the TD attempts to synchronize with it. 

In addition, a can be viewed as the product of a source 
generator before quantization but after y is determined. This 
can be shown more clearly on the equivalent factor graph 
Fig. |2(b)| All priors on aj and bi, and A^, being ini- 
tially the source generator first determines a%, . . . , a n by 
setting A^ to hard decisions, and the quantizer then determines 
bi , . . . , b nb . In the source generation process, BP can be run 
to yield the extrinsics i/*, and the true extrinsic probabilities 
is** can likewise be defined with P(a). Similar to the TTD, 
we define the true typical source generator (TTSG) as one 
generating each a with probability P(a). Since 

P(a) = P{a x )P{a 2 \ oi) • • ■ P(a n \ a^ 1 ), (52) 

and each factor P(aj = a\a\~ 1 ) is the v**(a) when erj - 
has been determined, the TTSG simply sets each aj = a with 
probability i/j*(a). In reality, all 2™ possible values of a are 
equally likely to occur, so we can safely assume that a comes 
from the TTSG if and only if P(a) is a uniform distribution, 
that is, each v** must be * when aj -1 has been determined. 

When both the TTSG and the TTD are used, each possible 
(b,a) is generated with probability p(b;a). Define u = 
(u(b) — a) mod to, each u then corresponds to 2 nb (b, a)'s, 
all of which having the same 



p(b;a) = _L e -*ll(S-«)x"ll 2 



(53) 



(48) 



and the total probability of generating u becomes 

n , 

p{u) = 2 nb p{b; a) = Y[ — e-*^- 8 ^ (54) 

n n 

= Y[Pz((yj - Uj)i) = Y[pz(zj) (55) 

3=1 3=1 

from d49l ) and ( fTSI l, noting that z = (y — u)j™ = (y — u)x^. 
Eq. d55l l shows that Uj can be viewed as i.i.d. samples condi- 
tioned on yj with probability density p(u \y) — p z ((y — u)x), 
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so for n — ► oo u will be strongly typical according to 
this conditional distribution with high probability, and the 
quantization error z is likewise strongly typical with respect 
to p z (z), so the resulting MSE is P t . 

To achieve this P t with the TD, again we have 

• extrinsic probability condition: v° must be close to v\* 
when decimating each bi, so that the TD can synchronize 
with the TTD; 

• equi-partition condition: P(a) must be a uniform distri- 
bution so that the use of TTSG here matches reality and 
does not pick "easy" source sequences with large P(a) 
too often. 

It may be interesting to note the relationship between the two 
conditions and the two inequalities in ( TTOb - 

Similar to the BEQ case, we assume that y is generated by 
the TTSG and use TTD's final result b* and the corresponding 
u* = c* = b*G as the reference codeword, then each Aj?, 
v°, u?H and ri^ have reference value b* and each A 1 ; has 
reference value c*, and DE can be carried out with respect 
to these reference values to analyze the above two conditions. 
The density of A^ (actually that of A^(c*)) can be obtained 
from d27l i using the strong typicality of z = (y — it*)i- 
with respect to p z (z). Furthermore, assuming that the TD had 
been synchronized with the TTD in all previous decimation 
decisions, A b is then b* at the decimated positions (whose 
fraction is denoted I b as before) and * elsewhere. We thus 
have all the necessary information for DE. 

In BEQ, we have found ( l38l l and ( f39l > to be sufficient and 
necessary for the equi-partition and extrinsic probability con- 
ditions to be satisfied with a vanishing fraction of exceptions. 
According to the definition of the EBP curve, d39l > and (l38l l 
correspond to two properties of the code and I c in DE: 
« Starting from any lb G [0, 1], DE converges to a unique 
fixed point regardless of the initial message density, 
provided that this initial density is intuitively "consis- 
tent", i.e. free of contradictions and not over- or under- 
confident0 

> The fixed point at 1^ = is at It>,cxt = 0, corresponding 
to both Hifs and /Jji's being all-*. 
We conjecture that these properties, which are again called the 
monotonicity conditions, are sufficient and necessary for MSE 
quantization as well. 

Proving this equivalence rigorously appears difficult^] We 

12 For binary quantization codes, this consistency can be defined rigorously 
as the symmetry condition of a message density in [21, Sec. III-D]. In BEQ, 
symmetry with respect to b* of e.g. the density of ft?? means that each fjq'r 
is either b* or * but never the opposite "sure" value (which would indicate a 
contradiction). In MSE quantization, it means that, with /ij?? being a randomly 
chosen b-to-c message, the probability density of jj!^ (b* ) at p and at 1 — p 
have ratio p : (1 — p) for any p g [0, 1]. All priors have symmetric densities 
when using binary codes, and the symmetry of the initial message density will 
thus be maintained throughout the DE process. The symmetry condition is not 
necessarily true in non-binary cases, so we keep using the term "consistency" 
for generality. 

13 The MAP EXIT curve can use basically the same definition [19, Defini- 
tion 2]; the area theorem [19, Theorem 1] still holds because only the Q there 
is different, while the Y there, corresponding to the A^'s, can still be viewed 
as BEC outputs. The area below the MAP curve is therefore 1 — H(b \ y)/n h , 
where H(b \ y) is the entropy of the distribution p a (b), and this area is again 
I c /R under the equi-partition condition using 453t . The EBP curve can also 



can, however, provide the following heuristic argument. For 
any number of iterations when n is sufficiently large, 
a randomly selected node bi will likely have a tree-like 
neighborhood in the factor graph within depth 21. If DE has a 
unique fixed point, for sufficiently large I the message density 
after I iterations no longer depends much on the initial message 
density from the un-tree-like part of the factor graph, so the 
resulting K ( b 's from BP, which is accurate for a tree-like factor 
graph, should be mostly accurate here0 As for the equi- 
partition condition, when the fixed-point at lb = does not 
correspond to all-* messages, in Fig. |2(b)| the Uj* rj i/j will 
not be all-* when the TTSG determines the last elements of 
a, so P(a) will not be a uniform distribution. 

Experiments show that these monotonicity conditions are 
more easily satisfied when t is small, but the resulting MSE 
P t will be larger. We thus define the monotonicity threshold 
t thl of a code as the maximum t that satisfies these conditions. 

As in BEQ, the above conditions are only sufficient in 
an asymptotic sense. In practice, even if t < t thl , the TD 
will desynchronize with the TTD due to the finite block 
length n and iteration count L, and a recovery algorithm from 
"incorrect" decimations is necessary to achieve acceptable 
performance with TD, though the greedy decimator usually 
performs adequately without recovery. This will be discussed 
in detail in Section [VI-CI 

Unlike BEQ, in which the monotonicity conditions mean the 
difference between being able and unable to find a solution 
(allowing for a vanishing fraction of unsatisfied equations), 
in MSE quantization the non-satisfaction of these conditions 
simply causes the asymptotic MSE to be higher than P t , 
which is dependent on t anyway. We will set t — t thr , 
so that we have an MSE P t thr that is asymptotically (as 
the block length n and the iteration count L go to infinity) 
achievable and analytically tractable, and we can then design 
the degree distribution to maximize t and make it approach 
its ideal value to(R), which corresponds to random-coding 
performance in Section III-BI However, further optimization 
on the choice of t is possible. 

B. The Erasure Approximation 

Similar to BEQ, the average Mis 1^, It>,cxt> ^bc> let an d Ic 
can now be defined for the densities of respectively \\, v°, 
/ijy, (i^ and A^, e.g. I bc is the average l-H(^) with H(-) 
defined in footnote [5] When the message densities satisfy the 
symmetry condition in footnote [T2] this is actually the average 
mutual information between the messages and their respective 
reference values. 

be obtained through DE, although its unstable branches may require tricks 
similar to [25, Sec. VIII] to find; but we no longer know the area below it. 
More importantly, the "erasure" relationship v° = v°* or v° = * in BEQ is 
no longer true, so it is difficult to relate the average Mis to the closeness of 
individual v° and ifi* 's, which was essential in our BEQ analysis. 

l4 The un-tree-like part of the factor graph is apparently difficult to deal 
with rigorously. A related proof is [25, Sec. X] on the accuracy of individual 
BP-extrinsic probabilities (represented by conditional means) when the BP 
and MAP generalized EXIT (GEXIT) curves match, which is based on the 
concavity of the GEXIT kernel relating conditional means and the "general- 
ized entropy" used by GEXIT. However, given the factors in the un-tree-like 
part of the factor graph, it is not clear why we have [L~p (Y) = E[Xi \ Y£j] 
in [25, Lemma 15]. 



MANUSCRIPT 



11 



In particular, from ((27) we can eventually obtain I c as 

I c = \og2-H t = 1-H t , (56) 

with H t defined in ( fT4t . This relationship allows us to define 
the monotonicity threshold alternatively in terms of I c , as 
7 c thr = 1- H ttbI , or t th * = t Q {I c ). 

When all densities are erasure-like, i.e. every message, as in 
BEQ, is either * or b where b is the message's reference value, 
< f34T > and d35| ) obviously hold. In general, / c b is not uniquely 
determined by and I c , nor is /be by I c t, and but (|34l > 
and (l35T l are still approximately true [26], [27], and the erasure 
approximation assumes them to be exact. The fixed points of 
DE are then characterized by the same EBP curve d36l l and 
(I37t . and according to the conditions above, the monotonicity 
threshold is the same as that given by d431l . In other 
words, the optimized degree distribution that maximizes the 
monotonicity threshold for MSE quantization under the EA is 
the same as that for BEQ. Of course, the true I' hr of this 
EA-optimized code will differ from that in ( |43] >. 



C. Quantized Density Evolution 

Besides the erasure approximation method, the analysis 
given above also enables density evolution to be carried out 
directly on quantized messages, which allows for arbitrarily 
good precision. Our DE scheme is similar to that in [24]. 
Without loss of generality, we can assume that b* and thus 
u* and c* are all-zero, in which case z = (y)jn should be 
strongly typical with respect to p z (z), and the density of A^'s 
can accordingly be computed with (127) . The messages are 
represented by uniformly quantized /-values, plus two values 
representing and 1. The b-node operations, which simply add 
up the /-values, become convolutions on densities that can 
be computed with fast Fourier transform (FFT), while c-node 
operations are decomposed into that between two messages 
and computed by table lookup 

To verify the monotonicity conditions at a certain t, two 
DE processes are then performed, one starting from all-* 
density with 1^ gradually increasing from to 1 (recall that 
A^'s density is always erasure-like), and the other starting 
from all-0 with / b gradually decreasing from 1 to 0. For the 
uniqueness of fixed points required by the extrinsic probability 
condition, it appears sufficient to check that the above two 
processes converge to the same fixed point at the same ib 
within the accuracy of quantized DE, and the equi-partition 
condition can be checked by observing whether the latter 
process converges to all-* messages when lb reaches zero. 
The monotonicity threshold t thi ' (corresponding to an I* hr ) is 
then the maximum t that satisfies these conditions. 

15 In LDPC optimization there are only one or two distinct check-degrees, 
but in LDGM quantization codes many more different c-degrees may exist, 
therefore it may seem tempting to represent the densities by instead the "dual" 
L-values, L = - sgn(L) lntanh(|L| /2) (see e.g. [21, Sec. III-B]), so that 
the check-operations can be computed faster with convolutions. Unfortunately, 
uniformly quantized L is not able to represent high-confidence messages 
(those with a large |L|) with sufficient accuracy for this approach to work. 



D. The EXIT Curves for MSE Quantization 

In principle, it is possible to use directly the quantized DE 
method to find the monotonicity threshold of a given code, 
with which the code's degree distribution can be optimized 
with e.g. local search methods or differential evolution [21]. 
However, this is computationally intensive and unintuitive. 

The inaccuracy of EA is mainly due to the erasure-like den- 
sities used for computing the EXIT curves ( 1341 ) and (f35T > being 
very different from the actual message densities encountered 
in DE. If the EXIT curves are computed using instead the 
densities encountered in DE of some base code under a base t, 
then they are obviously accurate for that code and t. Moreover, 
locally, i.e. for codes with similar degree distributions and for 
similar values of t, the densities encountered in DE are usually 
similar, therefore it is reasonable to expect the error in EXIT 
caused by EA to be approximately the same. If we model 
this error by a "correction factor" r(x), optimization of the 
monotonicity threshold can then be carried out with EXIT 
curves just like the BEQ case, simplifying it immensely. 

Specifically, given a base code and a base t, we model its 
EXIT curves with three functions /(•), g(-) and h(-), such that 
the average Mis in DE satisfy, under that t, 

Icb = Ic • /(/be), (57) 

/ b + e = l-(l-/b)-.9(l-/eb), (58) 

4,ext = 1 - Hi - I ch ). (59) 
Note that the erasure approximation corresponds to 

f(x)=J2 v cdX d -\ (60) 

d 

g{y) = y db ~\ (61) 

h(y) = y d \ (62) 

/, g and h are obtained from quantized DE results. We start 
with e.g. the base code optimized with EA, and the base t is 
chosen near its i thr . DE is then performed, starting from all- 
* /iy density, with /b increasing from to 1 slowly enough 
that the message densities are always close to fixed points. The 
average MI is computed for each density encountered, and we 
thus obtain a number of data points that can be interpolated 
to form /, g and h. The derivatives f'(x), g'{y)/g(y) = 
d(\ng(y))/dy and h'(x) used in the optimization below are 
then computed with finite differences. 

Under EA, we observe from d60ll-(l62ll that 

• /(•) and h(-) are increasing and convex, so /'(•) and h'(-) 
are nonnegative and increasing; 

• lng(-) is increasing and concave, so its derivative 
s'(')/s(') ls nonnegative and decreasing. 

In our numerical experiments (e.g. Fig. [6j, we find that 
these observations remain approximately true for quantized 
DE results except for a slight non-concavity of In g(y) for y 
close to 1. This will be useful in the optimization below. 

E. Optimization of the Monotonicity Threshold 

Similar to the erasure case, the EBP curve can be obtained if 
we equate l£ in ( f58l and 7 b c in ( l57T i and plot the relationship 
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Fig. 6. The /(■), In g(-) and h(-) curves of an optimized LDGM quantization 
code with R = 0.4461 b/s and d b = 12 at t = 3.97 (I c = 0.4429). Each 
curve is obtained from quantized density evolution results by connecting one 
data point from each iteration. The dashed straight line in the lng(-) plot is 
meant to show its approximate concavity. 



between 



1 



1 



(63) 



(64) 



g{y) 

(where x = and y = 1 — 7 c b) and It>,cxt- The monotonicity 
conditions for the base code then again become (l38l and d39b . 
The condition d38l means that BP does not progress at all 
when lb = starting from all-* b-to-c messages, which still 
implies v c \ = 0, i.e. no degree- 1 c-nodes. As for 

dh = g(y) - J c ■ (1 - x)f'{x)g'{y) 

dx {g(y)) 2 

the condition is equivalent to (noting that g'(y) > 0) 

4^T > Ic • (1 - x)f'[x). (65) 

g[y) 

According to our observations above, g(y)/g'(y) is non- 
negative and mostly increasing with respect to y and thus 
decreasing with respect to I c , while the right side of d65l > 
is nonnegative and increasing with respect to I c . Therefore, 
for each x 6 [0, 1], d65l l is usually satisfied by all I c up to a 
maximum I^ c (a;) = l/s dc (a;) which can be found with e.g. the 
bisection method, and the base code's monotonicity threshold 
is thus 



jthr _ 



max s 

rS [0.1] 



(66) 



which has a similar form to d43l . 

A comparison of s(x) and s dc (x) is shown in Fig. [7] We 
can then define the "correction factor" of the base code due 
to EA as 



r(x) = 



s dc (x) 
s(x) 



X € [0, 1] 



(67) 



This r(x) does turn out to be relatively code-independent. 
Therefore, for any code with a similar degree distribution to 
the base code, its J* hr can be approximately obtained from 
with s dc (x) = r(x)s(x) and s(x) in (l42l) . Denoting 
= l//* hr , the optimization of 2^ hr now becomes 



minimize s max 

subject to r(x)s(x) < s max , Vx £ [0, 1], 



(68) 



d d 

v cd > 0, Vd e V, 



Rd b 




0.3 



Fig. 7. The l/s(x) and l/s de (x) curves for the optimized R = 0.4461 b/s, 
ctb = 12 LDGM quantization code. As this base code is already well 
optimized, its l/s dc (a;) is almost a flat line except for x close to 1, and 
its minimum 0.4427 b/s is 7* hr by {66), which is quite close to R. If this 
code had instead been optimized under EA, l/s(x) would be almost flat 
but l/s de (a;) would not be, and 7* hr in {66} would be smalle r. N ote that 
s de (x) and r(x) cannot be computed for x very close to 1, as {65) is then 
unsatisfied only for I c so large that y lies outside the range of available DE 
data. However, since s de (x) is expected to be large for x close to 1, the 
constraints r(x)s(x) < s max in {68) are not usually tight for such x and can 
simply be removed. 



which is a linear programming problem similar to d44l) that 
can be solved in the same manner. The solution of d68l ). 
presumably better than the original base code, can be used as 
the base code for another iteration of the optimization process 
in order to obtain a more accurate r(x). 2-3 iterations of this 
process usually give sufficient accuracy. 

F. Relationship to Previous Methods 

It is now instructive to analyze the code optimization 
approaches previously proposed in [17] and [1]. 

In [17], the duals of optimized LDPC codes are used in the 
LDGM quantizer for binary symmetric sources. Under EA, 
this duality is in fact exact [14]. Specifically, if the variable- 
nodes and check-nodes in the LDPC decoder are denoted 
respectively as q-nodes and p-nodes, the erasure-approximated 
EXIT curves can be given using similar notation by 

~ >qd(l - W" 1 , (69) 



Iqp = 1 - (1 - Iq 



d 

. j d p- 



(70) 

They become identical to ( f34b and ( 1351 ) when we replace each 
q with c, p with b, each MI I with 1 — I, and let I b = 0. 
At the threshold of the LDPC code, the only fixed point is 
at Iqp = J pq = 1, which translates to the LDGM code's EBP 
curve crossing 1^ = at /be = let = h,ext = only. The 
method in [17] thus, in effect, maximizes the maximum t and 
J c at which the EBP curve satisfies this condition, without 
additionally requiring I b to mono tonic ally increase along the 
curve (see the I c = 0.5176 case in Fig. |4(b)[ ). Also, this duality 
is not exact in non-erasure cases [27, Fig. 3], though such dual 
approximations are common in LDPC literature [28]. 

In [1], curve-fitting is carried out between the erasure- 
approximated EXIT curves 

Oil and OH at I b = and I c = R 
(i.e. t = to(R)). This is roughly equivalent to making I b as 
close to zero as possible along the EBP curve at I c = R. 
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The three EBP curves in Fig. |4(b)| illustrate the difference 
among the three optimization criteria. Clearly, the methods 
in [17] and [1] do not maximize the monotonicity threshold, 
which has been shown above to be a reliable indicator of MSE 
quantizers' performance. Nevertheless, for reasonably large c?b 
all three criteria tend to make the EBP curve close to the 
lb = axis except where 7b,ext ~ 1> thus the difference among 
the resulting degree distributions is not large. This explains the 
good performance obtained in these previous works. 

VI. Decimation 

Decimation, i.e. guessing the values of some 6,'s and fixing 
them to hard decisions, is an essential component of our 
LDGM-based quantization algorithm. Apart from the afore- 
mentioned [17] and [18], ideas similar to decimation have also 
appeared in [29] and [19] in the context of LDPC decoding 
over BEC. In [29], guessing is used when a stopping set is 
encountered, and backtracking within a limited depth allows 
guesses leading to contradictions to be recovered from. In 
[19], the use of guessing with full backtracking (the Maxwell 
decoder) leads to the relationship between the MAP, BP 
and EBP EXIT curves mentioned in Section IIV-BI The area 
argument in Fig. [4(c)] suggests that amount of guessing needed 
by the Maxwell decoder is dependent on the non-monotonicity 
of the EBP curve and is also proportional to the block length n. 
In practice, the backtracking depth is limited by its exponential 
complexity, so backtracking is not expected to provide much 
gain for large n and will not be considered here. 

Without backtracking, there will unavoidably be "wrong" 
decimation decisions, which in the above analysis means that 
the TD decimates some bi to a different value from the TTD 
due to a difference between v° and v°* . This difference can 
be caused by non-satisfaction of the monotonicity conditions, 
the finiteness of block length n, or most importantly, because 
the limited iteration count L has not allowed BP to converge. 
In this section, we will attempt to get a rough idea of the 
impact of such incorrect decimation, how to recover from 
them, and how to minimize this impact within a given number 
of iterations. 

A. Controlling the Decimation Process 

Within a limited number of iterations L, the determination 
of how much decimation to do in each iteration, possibly based 
on the current progress of convergence, is obviously important 
in minimizing the amount of "incorrect" decimations. In [17], 
bits that are more "certain" than some threshold are decimated 
every few iterations. In [18], upper and lower limits on the 
number of bits to decimate at each time are introduced in 
addition. An early version of our quantization algorithm, 
instead, decimates a number of bits whenever the quantizer 
gets "stuck" for a number of iterations. The downside of 
these decimation strategies is their reliance on manual ad- 
justment of various thresholds, which can be cumbersome in 
code optimization, as different codes may require different 
thresholds for acceptable performance. Instead, our unthrottled 
decimation strategy controls the amount of decimation by 
forcing /be to increase by A/ b " m per iteration, with A/™ 11 



possibly dependent on the current Ibc0 Although this pace 
can also be optimized according to the code, as will be done 
in Section IVI-DI a uniform pace of A/™ 11 = 1/Lq already 
performs well, making the strategy very convenient to use. 

The throttled decimation strategy shown in Fig. [3] was 
introduced in [1]. It is based on the observation that the /be 
estimated in the algorithm is noisy and tends to progress some- 
what erratically, sometimes even decreasing, which in the un- 
throttled algorithm causes unintended variation in the amount 
of decimation in each iteration. To reduce this variation, the 
throttled algorithm introduces <5 max , which can roughly be 
viewed as the amount of decimation per iteration. (5 max is 
slowly adjusted according to the actual pace of convergence, 
and upon reaching the steady state /be should be increasing at 
the desired pace. 

In practice, at a given Lq, throttling does improve MSE 
performance but also increases the actual iteration count L. In 
terms of the L-versus-MSE tradeoff, the unthrottled algorithm 
is better for small L, when the iterations necessary for <5 max 
to reach its steady-state value represent a significant overhead, 
but for Lq greater than about 10 3 the throttled algorithm 
perform better, therefore both will be used in our simulations. 
A detailed analysis and optimization of the throttling strategy 
is an interesting problem of optimal control, and may be 
worthy of further study. 

B. Impact of Imperfect Decimation in BEQ 

We begin analyzing the performance impact of non-ideal 
decimation by looking at the simpler BEQ problem, viewed 
as a set of linear equations OTb over variables b\ , . . . , 6„ b . 
With finite block size n and iteration count L, BP cannot be 
expected to find an exact solution, so our aim is to minimize 
the number of unsatisfied equations. 

Incorrect decimations are indicated by contradictions in BP, 
e.g. 001. If we proceed with BP after contradictions by simply 
settingO 01=5, a large fraction of unsatisfied equations will 
result^ Intuitively, as the contradictory messages propagate, 
they essentially set a variable bi to in some equations and to 
1 elsewhere and determine the values of other variables with 
these contradictory values, and the confusion thus spreads. 

To avoid this problem, each known variable should be made 
to possess a consistent value in all equations. A class of 

16 The bit granularity of the amount of decimation as well as random 
variations in the estimate can cause the actual iteration count L to differ 
from the intended Lq. If, instead of making T^c increase by a certain amount 
depending on its current value, we make it increase to some value according 
to the elapsed number of iterations, then L will be more predictable, which is 
desirable in practice. However, our current unthrottled and throttled strategies 
are yet unable to control the decimation process well enough in this case, 
resulting in a worse tradeoff between iteration count L and the achieved 
MSE, therefore this will not be adopted here. 

17 A more elaborate treatment of contradictions in BEQ can be given as 
follows. Instead of setting to hard decisions and 1 when the source 
symbol yj = and 1, it is "softened" to probability tuples (1 — 8, 8) 
and (8, 1 — <5), respectively, where <5 > is an infinitesimal constant. Now 
let Lq = log((l — <5)/<5), and each message fi = (fio,fii) can then be 
represented by the scaled L-value l(fi) = (1/Lo) l°g(^o/^i)- F° r <5 — > 
and with I = l(fi), I' = £(/i')> the definitions of "0" and "©" imply that 
Qfl') = l + l' and l(p © p') = max(i + V, 0) - max(Z, I'), thus belief 
propagation can be run using this scaled L-value representation. This results 
in a slightly lower, but still large, fraction of unsatisfied equations. 
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"serial" algorithms of the following form have this property. 
Initially all variables are unknown, and in each step the quan- 
tizer may either guess the value of one unknown variable, or 
discover the value of one unknown variable with an equation 
in which all variables but that one are known{3 This process 
repeats until all variables become known. Suppose n g guesses 
are made, then the remaining rib ~ n g variables are each 
determined by one unique equation. These — n g equations 
are always satisfied, while the remaining 

m = n ne - (n h - n g ) (71) 

equations have been ignored in the process and half of them 
are expected to be unsatisfied. 

For the original "parallel" BP algor ithm0 a "recovery" step 
from contradictions can be introduced into each BP iteration, 
which changes some c-priors A^ (in effect making BP use a 
different source sequence) to fix the contradiction. Specifically, 

• If all incoming /i^'s to some Cj are "known" (0 or 1), 
and A^ is "known" and disagrees with them, flip A^ (0 to 
1 and vice versa) such that they agree, and compute the 
outgoing /i^'s accordingly. 

• If the incoming fi^'s to some bi include both and 1, 

- randomly pick one "known" pj^ and denote its value 
by b with b 6 {0,1}; 

- for each j e Nf that /i^ ^ * and ^ b, flip A^ 
and recompute all messages from Cj\ 

- compute the outgoing Hij's from according to the 
new incoming messages. 

With this recovery step, the parallel BP algorithm works like 
one of the aforementioned class of serial algorithms. In each 
iteration, 

• BP at c-nodes assigns tentative values to previously 
unknown variables bi according to equations, and all 
equations that are already unsatisfied are ignored due to 
the first rule above. 

• BP at b-nodes with the second rule above picks one 
assignment among possibly many for each newly known 
variable. This assignment becomes one "discovery" step 
in the serial algorithm, while all other assignments are 
ignored. 

• Each decimation of a bi with v° = * constitutes a "guess" 
step in the serial algorithm. 

Therefore it does view every variable consistently, and (fTTT i 
is applicable. Clearly, incorrect decimations now only cause 
more flips in the recovery step, but they do not affect the 
fraction of "known" variables and messages in each iteration, 
which can then be computed assuming that all decimations 
have been correct. For asymptotically large n and the typical 
decimator, this is given by the evolution of Mis according to 
the EXIT curves 01, ^ and <|37). 

The path followed by (Ib,^b,oxt) during the actual quanti- 
zation process has thus a staircase shape as shown in Fig. [8] 
and it is hence called the actual curve. Since I\> indicates the 

18 The choice is left to the individual algorithms within the class. 
"Of course, BEQ itself is more efficiently solved by a serial algorithm, 
but only a "parallel" BP algorithm can be extended to MSE quantization. 
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Fig. 8. A comparison of the EBP and the actual 7(,- versus_ ^b,oxt curves. Here 
v c i > so the EBP curve does not start from (0, 0). The gray area between 
the two curves is the delta-area A\. The flowchart on the right shows the 
trajectory followed by the quantizer on the actual curve in a single iteration. 



fraction of decimated bits, and in each iteration Ib.cxt is the 
fraction among newly decimated frj's that have v° = or 1, 
the area above the actual curve A g = 1 — is the overall 
fraction of guesses %/n.b. We have found the area below 
the EBP curve to be A nc = I c /R = nl c /n,b (approximate 
when v c i > 0), so from (|7T1 i the delta-area Ai — A nc — A^ 
between the two curves is asymptotically rii/rib, and it can 
thus serve as a measure of the number of unsatisfied equations. 
As the number of iterations goes to infinity, the actual curve 
approaches the BP curve, and the delta-area goes to zero if and 
only if the monotonicity conditions d38l l and ( |39l l are satisfied. 

C. Impact of Imperfect Decimation: MSE Quantization 

For MSE quantization, simulation results show that the 
typical decimator by itself again has poor performance. The 
reason is similar to the BEQ case: imperfect decimation causes 
message densities to be no longer consistent, in effect contain- 
ing "soft" contradictions that slow down future convergence if 
not recovered from. The greedy decimator in Fig. [3] however, 
does achieve satisfactory performance in this case, presumably 
because it tends to choose better-than-typical codewords and 
the resulting gain can usually compensate for the effect of 
imperfect decimation. 

It is still of interest to make the more analytically tractable 
TD perform acceptably by recovering properly from incorrect 
decimations. The method is similar to the recovery at c-nodes 
in the BEQ case: if the prior A^ at some c, is inconsistent with 
the incoming messages p°j, as summarized by the extrinsic 
probability 

(72) 



1= e 

i'eAf b f 



then A^ is adjusted to fix the inconsistency, by using a slightly 
different i/j (which is recomputed in every iteration) instead 
of yj in d27] >. 

We first analyze the relationship that yj (or Ap and Vj 
should have if all decimations are correct, i.e. the equi-partition 
condition is satisfied and our TD is perfectly synchronized 
with a TTD. Assuming y G X n = [— 1, 1)" without loss of 
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generality, and using TTD's final result b* and the correspond- 
ing u* = c* = b*G as the reference codeword, we can define 
z = (y — u*)jn and p with pj = v C j(c*), which should then 
asymptotically satisfy the following typicality properties: with 
j being random, 

• Zj has pdf p z (z), because of z's strong typicality shown 
in Section IV^Al 

• pj's pdf at p and (1 — p) have ratio p : (1 — p) for any 
p G [0,1], due to the symmetry condition (footnote 112b 
also satisfied by the density of ifc; 

• Zj and pj are independent, since pj comes from the 
extrinsic v c -, which only depends on information other 
than yj in Cy's tree-like neighborhood in the factor graph. 

In the actual quantizer b* is unknown, so instead of pj only 
qj = Vj(l) is observable. From the above property of p, 
among those j's with qj near some q e [0,1], a fraction q 
should have c* = 1 and the rest have c* = 0, therefore the 
density of yj at these positions should be 

Py,g(v) = (! - l) ■ PM + 9 ' - l)z)- (73) 

This relationship d73l i can be checked by comparing the 
actual cumulative distribution function (CDF) of yj at the 
positions where qj « q, denoted by F y ^ q (y), to the CDF 
Fy,q(y) corresponding to p yjq (y). When they are different, our 
recovery algorithm attempts to find a y close to y such that 
the corresponding CDF of yj matches F y . q (y), thus allowing 
BP to continue as if decimation had been perfect. 

Denote Fo(y) — F y i/ 2 {y) as the CDF of the uniform 
distribution over I, and G(y) = F y .\(y) — F y< o(y), then 

F y . q (y)=F (y) + (q-l/2)G(y). (74) 

To help estimate Fy tq (y), it is similarly approximated as 

F m (y) = F {y) + (q-l/2)G(y), (75) 

so that only G(-) has to be estimated. For any y € I, F v „(y) 
is the average of l[yj < y] over positions j with qj « qj 20 l 
therefore G(y) can be unbiasedly estimated by 

E"=ife-V2)(ife<y]-^o(y)) 



G(y) 



Having obtained G(-) and thus Fy >q (-), we can let 



(76) 



(77) 



then yj should have the desired CDF F y . q (-) at positions j 
with qj ks q. 

In practice, G(y) is computed for a few discrete values of y 
that divide X into intervals. By first summing the correspond- 
ing (qj — i) for y 3 's falling in each interval, d76l ) for these y's 
can be computed in 0(n) time. Initially this estimated G(y) 
will be rather noisy and may need to be adjusted such that all 
CDFs remain monotonic and within range. The transform ( TTTb 
is then evaluated at these j/'s and a few discrete values of q, 
after which each yj is computed by bilinear interpolation. The 
symmetry of p y , q (y) and p y , q {y) (corresponding to F y , q (y)) 

20 l[?/j < y] i s defined as 1 if yj < y, otherwise. 



around y = can be further exploited to simplify this process. 
This recovery procedure is carried out at the beginning of each 
iteration (or possibly once every few iterations), after which 
the A^'s are recomputed with (|27T i using jjj for yj. 

When TD is used with recovery, the message densities can 
be kept approximately consistent after imperfect decimation, 
allowing the average Mis to evolve according to the EXIT 
curves 07] ). ( f58l l and (15 9t . and the actual curve as well as 
the areas and Ai can thus be similarly defined. We do not 
know of any definite relationship between the delta-area Ai 
and the MSE, as the amount of movement between y and y 
in recovery is hard to analyze. Nevertheless, simulation results 
suggest that the MSE can be roughly estimated by 



a 2 = [ 1 - 



Pt 



A, 



■Po, 



(78) 



I C /RJ I e /R 

where Po = Pt\t=o is the zero-rate MSE and is | in the 
binary case. Intuitively speaking, each yj can be viewed as 
a soft constraint on b that amounts to I c hard constraints, 
and the nl c hard constraints in total are represented by the 
area nl c /n^ = I c /R, which in our simulations appears to 
be the area below the EBP curve just like the BEQ caseF*! 
The area below the actual curve, Aa = I c /R — Ai, represents 
satisfied constraints having MSE P t , while the delta-area Ai 
represents ignored constraints, corresponding to quantization 
error uniformly distributed in T with MSE Po, therefore we 
obtain an explanation for d78l >. Even though d78l > is not exact, 
it does give a reasonably accurate relationship between Ai and 
the MSE, and the minimization of Ai will thus be our objective 
in the optimization of the pace of decimation below. 

D. Optimal Pacing of Decimation 

We can observe from Fig.[8]that a large number of iterations 
is needed to make the actual curve fit closely to the EBP curve 
and achieve a small delta-area, which is necessary for good 
MSE performance. Under a fixed number of iterations, this 
tradeoff can be improved somewhat by optimizing the pace 
of decimation, as will be discussed in this subsection. This 
iteration count will be denoted by L in the analysis here; it 
corresponds to Lq in the quantization algorithm, which may 
take a slightly different number of iterations to converge. 

Denote the Mis at each iteration I by e.g. 7^. If the 
deviation of the actual curve from the EBP curve is sufficiently 
small such that the DE results d57ll-(|59i> remain valid, we then 
have, for each I = 1, . . . , L, 



(0 



b 

Ai) 

1 bc 

r(0 _ 
'b,ext 



1- (!_/«). 5 (1 _/W), 



= l-h(l-I%) 



b 

cb 



All these Mis can be viewed as functions of 7, 
[0, 1], subjected to boundary conditions 



(i) 

be ' ' 



r(0) _ 
-'be — u ' 



1 bc — l > 



(79) 
(80) 
(81) 

' 1 bc fe 

(82) 



2 'At least, when the monotonicity conditions are satisfied, we expect the 
EBP curve to coincide with the MAP curve, the area below which is indeed 
I c /R as shown in footnote fO] 
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and monotonicity constraint (since there can only be more 
decimated bits after more iterations) 



< IP < 



(L-l) 



(L) 



1. 



The area below the actual curve is then 

L-i 



Ed 



b.cxt 



AO ) 



(83) 



(84) 



where we have set /, 



(0) 



r(0) 
b,ext 



uniform pacing used in [1] corresponds to = 1/ L, and we 



for convenience. The 

(0 



now optimize , . . . , l£ ±l to minimize the delta-area Ai, 
or equivalently, to maximize Ad in ([84-b . 

Usually 7 C < J* hr (or only slightly larger), in which 
case the monotonicity constraint d83l is frequently redundant. 
Ignoring this constraint, the maximization of A& can then be 
efficiently solved by dynamic programming. Specifically, for 
each — x S [0,1], define 



L-l) 



'be 



L-l 



and it satisfies the recursive formula 



A«\x) 



max 

r (0 



with Ajj = 0. The maximum of Ad is then A y ^'(0) plus the 



(86) 



constant term 

r(0) 



(1-4 



)f/ (1) 
A J h cxt 



4°it) = i-Mi-/c/(0)). 



(87) 



After discretizing x, the recursion 



can be evaluated 



numerically, obtaining the optimal , . . . , lj£ . 



If the solution thus obtained violates (1831 1. that is, this 
constraint turns out to be tight, a good but suboptimal solution 
can be found by imposing the constraint "greedily" during the 
recursion (l86l i: when computing the previous A^ +1 \x), the 
I b +lS> corresponding to the optimal -fbc + ^ is recorded along 
with the maximum for each x, and then the maximization with 
respect to 1^ is done under the constraint iP 1 ' < I b +1 ^ ■ 

When L is large, the above optimization can be simplified, 
which also enables us to analyze the asymptotic performance 
as L — > oo. For each I, the I b corresponding to I be:xt on the 
EBP curve, l£ , is determined by 



I, 



(l-l) 

be 



= i-(i-/: (i) ).. 9 (i-^) 



(0> 



(88) 



Comparing ® and (88), A/^ = - should satisfy 



(0 

be 



(l-l) 
be 



(89) 



For large L, I can be viewed as a continuous-valued variable 



and x 



r(i-l) 



is an increasing function of it, with dx/dl 



1^} — iPl 1 . AJu^ ef a/ can then be viewed as functions of 

be be b 

x rather than of I, and defining y = 1 — l\ > = 1 — I c f(x) as 
before, (|89l becomes 



(At: 



A/ b (x)- 5 (2/). 



(90) 



The number of iterations is then 



L - I — dx 

dx 



dx 



o &h{x)-g{yY 



(91) 



and since /, 



(0 _ 



1 — h(y) = 1 — h(l — I c f(x)), Ai becomes 



Jk= f AI b (x) 
Jo 



dx 



■ dx 



= / AI b {x)-I c -f'(x)-h'(y)dx. 



(92) 



(93) 



The constraint ( 1831 basically requires Ib(x) = I£(x) + AIb(x) 
to be non-negative and increasing with x. Note that this 
reduces to (l38l and ( |39l when L — > oo and thus AJb(a;) — > 0. 

Again, in practice d83l > is usually not tight and can be 
ignored at first, and the minimization of d93l (a functional 
of AJb(a;)) under constraint ( 1911 can then be solved with 
Lagrange multipliers. Setting 

^//"^ = W^)h'{y) ( . T (\\ 2 . , = 0, (94) 
d[AI b {x)\ {AI b {x)) 2 g{y) 

we find the optimal A/b(a;) 

AI b (x) - (A/ c ■ f(x) ■ g(y) • h'{y))~ 1/2 , (95) 

and d90i > then gives the desired increase of I bc per iteration. 
Substitute d95l > into d9lT ) and ( l93l and we get 



1 I c -f'(x)h'(y) 




g(y) 



dx 



(96) 



Therefore, L anc/ Ai are inversely proportional when L is 
large and d83t is not tight, which is an interesting result on 
the loss-complexity tradeoff of LDGM quantization codes. The 
right-hand side of d96l ) can be numerically evaluated and is 
generally slightly smaller than 4. For example, it is 3.365 for 
the optimized d b = 12 code used in the simulations below, 
and under the erasure approximation and d98l l below we get 
4(db — l)/<4» which approaches 4 for large d b . Indeed, when 
L is large, AI b (x) is basically scaled by different constants to 
achieve different tradeoffs between L and Ai, so from d9Tl > and 
d93b we see that this inverse proportional relationship is also 
true for other paces. For example, from d90l ), uniform pacing 
corresponds to AI b (x) = \/Lg(y), which results in 



LA\ = 



1 h ■ f(x)h'{y) 

g{y) 



dx. 



(97) 



For the same optimized d b = 12 code, d97b evaluates to 4.701, 
therefore for large L the optimized pacing of decimation is 
expected to require approximately 3.365/4.701 = 72% as 
many iterations as uniform pacing to achieve the same MSE 
performance. 

In practice, Ai is not very sensitive to AI b (x), so ( |95l l can 
be further approximated. We can observe that the EBP curves 
of good codes have ijj w for all x but those very close to 
1, which means iRil - g{y)- Taking derivatives, we have 



Ic-f'(x)-g'(y)^l, 



(98) 
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and d95b and d90l ) then become 



dx 



! g(y) -g'jy) 

X-h'(y) ■ 



(99) 



If the erasure approximations d6"Tl ) and ( f62b are used in 
addition, we get a simple formula dependent only on dt>: 



dx 

~dl 



i 



Xdb 
2(4 - 1) 
Ld h 



(d„-2)/2 



(l-x) ! 



(100) 



(101) 



where we have used x « l~g(y) and d9TT i in ( llOU . Eq. jlOU 
is still near-optimal: its I/^4i for the optimized db = 12 code 
is 3.443, only slightly larger than the optimal 3.365. 

In the actual decimation algorithm, we adopt such a pace 
by setting L to Lo and A/^ m to this dx/dl, with x being the 
Jbc estimated in the algorithm. 

E. Pacing-Aware Code Optimization 

Our code design method in Sections |IV] and [V] has focused 
on maximizing the monotonicity threshold 7* hr , and with t 
chosen such that I c = J* , this minimizes the resulting MSE 
P t as the delta-area approaches zero with L — > oo and n — > oo. 
We have mentioned at the end of Section IV-AI that this is 
not necessarily optimal; ideally t and the degree distribution 
should be jointly optimized, and when L is finite, the pace 
of decimation should be included in the joint optimization as 
well. Doing this optimization precisely would be prohibitively 
complicated with limited benefit, so below we will look 
at a simple heuristic adjustment on the degree distribution 
optimization process for finite L that nevertheless results in 
some performance gain. 

According to our analysis above, for large L, if the opti- 
mized pace of decimation given by d95l l and (|90~b does not 
violate the monotonicity constraint d83l ). then the resulting 
Ai is inversely proportional to L, and the product LA\ given 
by d96b is not very dependent on the code. When optimizing 
the code's degree distribution for a fixed L, we can therefore 
approximately view Ai as a constant, and d78l ) suggests that 
the optimization should maximize the maximum I c satisfying 
d83l . hence denoted i™ r ' . As L goes to infinity, d83l reduces 
to the code's monotonicity conditions (l38T l and (l39l . and this 
optimization method reduces to that in Section IV-EI 

The optimized pace of decimation is approximated by the 
code-independent (llOll l. which can be integrated to yield 



;(/) = l-(l-Z/£) 2(db " 1)/db 



(102) 



We also define l(x) as the inverse function of x(l), and 



r(0 



p + (x) = I 1 (l(x) + 1) as the mapping from to I i 

Now let 7 bc = x and 1+ = p+ (x) in the EXIT curves ( 157b and 
and we obtain the 7b needed for this pace of decimation: 



4 = 1- 



l-p+(x) 



1 



l-p+(x) 



(103) 



The condition 
Since /(0) = v c i (when 1^ 



9(y) g(l-I c f(x)Y 
means that Jb| x =o > an d dl^/dx > 0. 



degree- 1 c-nodes have average MI I c while all other c-nodes 
output all-?, so / c b = I c v c i), the former is equivalent to 



l-g-^l-p+iO)) 



(104) 



On the other hand, dlb /dx > is equivalent to 

g(y) . , f(x)-(i- P +(x)) 
g(y) p+'{x) 

which is similar to ( |65] l except with (1— x) replaced by q(x) := 
(1 — p + {x))/p + '(x). Under the erasure approximation, where 
g(y) /g'(y) = 2//(<4 — l) by doTt . it is thus sufficient to change 
the s(x) in $3} into 

S ( L Hx) = J2 VoiX*- 1 + (d h - l)q(x) ^(d - l)v cd x d ~\ 
d d 

and replace v c \ = with the linear constraint 

Vd < w(1-5 _1 (1-P + (0))) 



(106) 



(107) 



corresponding to (11041 ). When not using the EA, the coun- 
terpart of s dc (x), s dc ^ L \x), can be defined in a simi- 
lar manner to Section IV-EI and r(x) becomes A L \x) = 



(x)/s^ '(x). The maximization of /* hr ^ is then a linear 



0, the c-to-b messages from 



s dc,(L)( 

programming problem similar to d68l l. except with r(x)s(x) 
replaced by A L ^(x)s^(x) and v c \ constrained by (I1071 >. 

VII. Non-binary LDGM Quantizers 

The binary LDGM quantization codes designed in the last 
few sections could, as we shall see in Section IVIIII achieve 
shaping losses that are very close to the random-coding loss. 
However, the random-coding loss of binary codes is at least 
0.0945 dB; this limitation has been observed in [9] in view 
of the performance advantage of 4-ary TCQ compared to the 
binary convolutional codes used for shaping in [7], and it is 
more evident in LDGM quantization codes. From Fig. Q] it 
is clear that non-binary codes, i.e. those with a larger m, are 
necessary. 

In channel coding, two types of approaches exist in deal- 
ing with non-binary modulation schemes such as 4-PAM/16- 
QAM: one may use a binary channel code and modulate 
multiple coded bits onto each channel symbol, as in bit- 
interleaved coded modulation (BICM) with iterative detection 
[30], [31]; alternatively, a non-binary channel code such as 
trellis-coded modulation (TCM) [32] or a non-binary LDPC 
code can be used, such that one coded symbol is mapped 
directly to a channel symbol. Similar methods can be applied 
to MSE quantization. TCQ, for example, has a 4-ary trellis 
structure just like TCM. The use of LDGM codes over Galois 
field GF(2 K ) for quantization, as proposed in [33], also fits 
in this category. However, [33] does not consider decimation 
issues and degree distribution optimization much, and these 
problems are more complex for such non-binary LDGM codes. 
In MSE quantization, where the mapping between GF(2 K ) 
and the modulo- 2 K structure of the reproduction alphabet 
is not natural anyway, such complexity seems unjustified. 
Therefore, we have instead adopted a BICM-like approach in 
[1], where the LDGM code itself is still binary and every 
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Fig. 9. The factor graph of the 2 -ary LDGM quantizer when K = 2. 
Note that all c-nodes connecting to the same u-node have the same left- 
degree. The factor graph also has a perturbed form akin to Fig. |2(b)| with a 
2^ -ary variable node 3j connecting to each Uj. 

two coded bits in a codeword are Gray-mapped to a 4-ary 
reproduction symbol, and we have found that this approach 
allows near-ideal codes to be designed under the erasure 
approximation with relative ease. 

In this section, we will propose an improved version of the 
scheme in [1], which also has near-ideal MSE performance 
but allows even simpler code optimization, and is applicable 
to general 2^-ary, not just 4-ary, cases. Most of the optimiza- 
tion methods proposed in the previous sections will then be 
extended to this scheme. 

A. Quantizer Structure 

The m-ary LDGM quantizer with m = 2 K uses the 
codebook A = U + mZ n , where each codeword u 6 U 
is obtained by Gray-mapping every K consecutive bits in a 
binary LDGM codeword c of length n c — Kn into an m- 
ary symbol in u. Denoting the generator matrix of the binary 
(n c ,?ib) LDGM code by G, its rib = nR information bits 
by b, the Gray mapping function by 0(-) (e.g. 0(00) = 0, 
0(10) = 1, 0(11) = 2, 0(01) = 3 for K = 2)0 and denoting 
jk = K(j — 1) + k, we have 

c = bG, Cj := (cj 1( c i2 , . . . ,Cj K ), (108) 
^=0(c,), j = l,2,...,n. (109) 

The corresponding factor graph for q y (b) is shown in Fig. [9] 
where the c-nodes represent ( 11081 l and the u -nodes represent 
(f!09l . Each factor e^te-^)! in ©, with 1 = [-m/2, m/2), 
is included in the priors A", which now has m components 
since Uj is m-ary: 

A » = -^-e-^-^i = p z (( Vj - u) z ). (110) 

The quantization algorithm in Fig. [10] then follows from the 
BP rules on this factor graph. 

22 The optimization methods below appear to be usable for other mappings 
</>(■) as well. Indeed, </>(■) can even conceivably be a vector-valued mapping 
for y being a sequence of vectors, which results in a form of vector precoding 
[11], though various details remain to be worked out. 
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{compute the 2 ff -ary priors AH; X= [-2 K ^ 1 ,2 K ~ 1 )} 
AH (it) <^= p z (( yj - u) x ), j = 1, . . . ,n, it = 0, . . . , 2 A ' - 1 

^ *' ^Tki *. * = 1. • • • . «b. i = 1, • • • , 1, fc = 1, . . . , K 

<=*, i = 1, . . . ,n h 
E •<= {1, 2, . . . , m,} {the set of bits not yet decimated} 

<5max <= 0, I hc <J= 

repeat {belief propagation iteration} 

for j = 1 to n do {BP computation at Uj } 

Compute^ with fTTTl for k = 1 K 

end for 

for s = jh = 1 to ra c do {BP computation at Cj k } 
end for 

for i = 1 to do {BP computation at b;} 

<= A ' © (®.'€JV*\{.}^)' S 6 ^ 

end for 

Estimate IT' and do decimation as in the binary case 
until £ = 

bi <^ (resp. 1) if A^ = (or 1), i = 1, ,n b 
Compute c and it from b with 11081 and 11091 

z j = (% _ u j)i> x i = Vj - Zj, j = l,...,n 

Fig. 10. The 2 x -ary quantization algorithm. The decimation part is almost 
the same as the one in Fig. fj] so it is not reproduced here. 

B. Code Optimization: Erasure Approximation 

The LDGM code here is still b-regular and c-irregular, with 
all b-nodes having right-degree db- To simplify analysis, we 
make all c-nodes connecting to the same u-node have the same 
left-degree, which is called the c-degree of the u-node. We 
denote by Wd the fraction of u -nodes with c-degree d, and by 
Vd = Kdwd/ '{Rdb) the corresponding fraction of edges. 

Using essentially the same argument as in Section IV-AI 
under the monotonicity conditions a reference codeword de- 
noted by u*, c* and b* can be found with the TTD, and the 
corresponding quantization error z* = (y — ti*)i~ is strongly 
typical with respect to p z (z). 

As in the binary case, we begin with the simpler erasure 
approximation, which can serve as a starting point for more 
accurate methods. Similar to Section IV-BI EA assumes that 
e.g. /be is determined solely by 7 cb and can be computed by 
assuming the density of c-to-b messages to be erasure-like with 
respect to the reference codeword. Clearly, with a fraction 1^ 
of decimated b-nodes, the output IZ. and Jb,ext from b-nodes 
are still given by d35l l and (f3Tb . Below we compute the c-curve 
relating the output J c b from c-nodes to their input /be- 

Consider a u-node Uj with c-degree d. Due to EA, each 
incoming c-to-u message nf , must be either c* or *, with 
the former occurring with probability (/be) ■ Each outgoing 
message is given by 

which depends on the set 

S = {k' € {1, • . . , K}\{k} | v? klj = ^} (H2) 

of used incoming messages that are "known". It is now useful 
to define auxiliary random variables u, c and y, such that 
u = 0(c) is 0, 1, . . . , m — 1 with equal probability and y G 
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[0, to) has conditional pdf p(y | u) = p z ((y — u)i). p(y) = 
J2uP(u)p(y I u) is then a uniform distribution over [0, to) and 
p(ii | y) = p z ((y — u)x), so dl 101 ) becomes simply 

K(u) = p{u = u\y = yj), u = 0, 1, . . . ,m - 1, (113) 



and ( 111 U becomes the conditional distribution (omitting c- 
independent factors Q 



E 

c:c k =c,cs=c 



p{u = (j>(c)\y = yj) 



p{ck 
p(c k 



c, c 5 = c 



c\c s f-,,.// //,K 



(114) 

(115) 
(116) 



To obtain the average MI Z c b, we first average H(jijj ) = 
H(dk | C5 = c* s ,y — yj) over j for a given fe and S. For 
n — > 00, using the typicality of z* with respect to p z (z), this 
yields the average conditional entropy 



H c (k,S)=H(c k \c s ,y), 



(117) 



which can be computed using the above probability distribu- 
tions of c and y. Among u-to-c messages from u-nodes with 
c-degree d, k = 1 , .... K with equal frequency and each S 
with \S\ = k' occurs with probability 1^ ■ (1 - l£ c ) K ~ 1 ~ k ' , 
therefore if we define, for k' = 0, . . . , K - 10 



H c , k , - 



1 * 



H c(k,S), (118) 

k=l SC{l,...,K}\{k} 
\S\=k' 



K-l 



Ic.k' — 1 — H c l 



1 ^ 



(119) 



the average MI of these messages is then 

A'-l 

k 



i uc , d = E ( K „ l \ ■ I** ■ 4f • (i - iL) K - l ~ k '- (120) 



fc'=0 



Finally, since the b-to-c message density is assumed to be 
erasure-like, a look at the local tree-like neighborhood of a 
c-node reveals that 

Icb = ^ v d I uc,dltc 1 = E VdI ^ k ' ' a k'Ahc), (121) 



k'd 



where 



(122) 



Having obtained the EXIT curves (l35l . &7\ and (TUTi the 
EBP curve can be defined just like the binary case, as the 
relationship between the 7t> making l£ = Jbc and ib,ext0The 

23 £5 = c* s is abbreviation for = c* fc , Vfe £ <S. 

24 This Z c satisfies Jf J c = A" - H(c \y) = K - Ht due to OTTl . 

25 The area below this erasure-approximated EBP curve, as defined in Fig. [5] 
can be found to be KI C / R — d^vil c fi + (1 — (1 — Vi I c ,o) b )> which equals 
KI C /R when vi = and slightly smaller otherwise. Interestingly, this is 
the same as the binary case except that I c becomes KI C and v c il c becomes 
^i^c,o- As in footnote [13] the MAP EXIT curve can also be defined, and the 
area below it under the equi-partition condition is now KI C /R as well. 



monotonicity conditions are again d38l and (|39l ; the former 
means vi = 0, and the latter, dl^/dx > (x — /be), becomes 



K-l 



hk' -Sfc'(z) < 1, ze [0,1], (123) 



k'=0 



where 



(x) = E Vd ( a k'Ax) + (1 - x)(d b - l)a! k , 4 {x)) 



(124) 

For a given degree distribution, the monotonicity threshold t thl ' 
(or the corresponding I c denoted by I^ hr ) I s tne maximum t 
such that (1123b holds. Since all / c ,fe''s are increasing functions 
of t, the degree distribution with the largest t thr can be found 
via a linear search for the maximum t at which the linear 
constraints ( 1123b and 



A' 



(125) 



on Vd, with d £T> given by ( 145b . are feasible. As in the binary 
case, we can then use t — t thl in the quantization algorithm. 

In practice we often have a good guess t* (e.g. to(R) when 
db is large enough) of i thr , along with the corresponding I* k , 
and I*. If t* is close to t thr , we can approximately view 
Ic,k' I Ic as t-independent constants 7^ := I* k ,/I*, and (11231 ) 
then becomes (|4~TT > with s(x) given by 



s(x) 



j<:-i 

E 

fe'=0 



7fc' • s k '(x), 



(126) 



so the above optimization is again a linear programming 
problem (1441 



C. Coc/e Optimization: Density Evolution 

As in the binary case, it is expected that discretized density 
evolution will yield better codes by avoiding the erasure 
approximation. The method used is essentially the same; the 
only difficulty lies in the computation of the outgoing u-to-c 
message density from u-nodes with c-degree d, for which the 
K — 1 incoming c-to-u messages follows i.i.d. a given density. 
When K — 2, this u-to-c density can be computed with a 
two-dimensional lookup table on the quantized incoming c- 
to-u L-value and the quantized yj, much like the lookup table 
used at c-nodes. 

For larger K, this table-lookup method requires a table with 
K dimensions, and the resulting computational complexity is 
likely impractical. We have not investigated this case in detail, 
as K — 2 is already sufficient for MSE quantization, but it 
seems that a Monte-Carlo approach may be effective for such 
density computation at u-nodes. 

The DE results can be used to obtain the EXIT curves, and 
the monotonicity threshold be thus optimized, in essentially 
the same manner as Sections [V-D| and [V-E| In the computation 
of the correction factor r(x), ( 1126t should be used as the 
reference s(x). 
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D. Pacing of Decimation 

Under a finite number L of iterations, the approximate rela- 
tionship d78l between MSE and delta-area still holds according 
to simulation results (but Po is now m 2 /12), therefore we can 
still optimize the pace of decimation by minimizing the delta- 
area with the same methods in Section IVI-DI In particular, 
(llOU is unchanged from the binary case. 

The method in Section IVI-EI can still be used to optimize 
the degree distribution under a finite number of iterations 
with a given pace. However, now l c f(0), the I c b when b- 
to-c messages are all-*, should be UiJ C) o according to dl21l ), 
in which the erasure approximation is exact here. Therefore 
( 11041 ) should be replaced by 

l- 5 - 1 (l-p+(0)) l-g-^l-p+iO)) 



and the corresponding linear constraint ( 11071 ) becomes 

l-g-\l-p+(0)) 



vi < s r 



7o 



(127) 



(128) 



Finally, s™(i) now has the same form as s(x) in (11261 ). 
except with the (1 — x) factor in (1124b replaced by q(x) = 
(l-p+(x))/p+'(x). 

VIII. Simulation Results 

In this section we evaluate the MSE performance of our 
quantization codes by Monte Carlo simulation. For our un- 
ary code (m = 2, 4), without loss of generality each source 
sequence y is uniformly sampled from [0,m] n , quantized 
to x, and the MSE is then evaluated as i 53j=i \Vi ~ x j\ 2 - 
Denoting a 2 as the average MSE over a number of source 
sequences used in the simulation (usually 20 at n — 10 5 and 
more for smaller n), the shaping loss can be estimated by 
101og 10 (G(A)/G*), with 



G(A) 



a 2 p 2 / 



~ = (2*/ 



R 'm) 2 27rea 2 . 



(129) 



G* (2ne) 

We will first evaluate long-block performance (n = 10 5 ) 
of binary and 4-ary codes, then the impact of smaller block 
lengths n will be investigated. Unless otherwise noted: 

• The degree distribution is optimized with one of the 
following methods: 

1 ) DE: maximize with quantized density evolution 
(Sections IV^IVTTCl) ; 

2) EA: maximize 7* hr under the erasure approximation 
(Sections EE HV^Cl IVII-Bk 

3) DE-PO: maximize I c thr L (L = L ) with quantized 
DE (Sections |VTEl IVII-DD ; 

4) EA-PO: maximize I* hT > L with EA (Sections |VTEl 
NTTHi . 

• The code is randomly generated from the degree distribu- 
tion by random edge assignment, followed by the removal 
by pairs of duplicate edges between two nodes. 

• The t used in the quantization algorithm is to(KI^ hl ) 
or to(KI* hl > L ), such that I c = / c thr( ' L) . When the EA 

thrf L) 

or EA-PO method is used, this I c is the erasure- 
approximated result; the true I* hl ( ,L > is lower. 



• The greedy decimation algorithm is used. 

• The pace of decimation is given by dlOlt . 

• The decimation process is controlled to make the actual 
iteration count L close to the target Lq, using the throttled 
algorithm if Lq is marked with a prime (e.g. Lq = 10 3/ ), 
and the unthrottled algorithm otherwise. 

• The recovery algorithm in Section IVI-CI is not used. 

A. Performance of the Greedy Decimator at n = 10 5 

For binary codes, the random-coding loss is significant, 
therefore we choose the code rate R = n^/n = 0.4461 b/s 
with to(R) = 4, where the random-coding loss of 0.0976 dB 
is close to minimum. 

For 4-ary codes, the code rate is chosen to be R = rib l n = 
0.9531 b/s at t = 2 in some cases, where the random-coding 
loss of 0.0010 dB is close to minimum. However, for moderate 
iteration counts L there are now a large range of rates for 
which the random-coding loss is small compared to the loss 
due to the delta-area, and d78l suggests that the latter loss 
increases when higher rates are used, since Po becomes a 
larger multiple of P t . Therefore, we also experiment with 
somewhat lower rates that may give better MSE performance. 

On the choice of db, we note that gap between AT/' 1 " 
and its ideal value R decreases rapidly as db increases, but 
computational complexity also increases, and the finite-n loss 
may worsen when the factor graph is denser. Therefore, we 
choose db such that the maximum c-degree is about 50-100. 

Results are shown in Table [TTJ KI^ hi is shown for each 
code optimized with the DE method (the factor K makes it 
easy to compare KI^ hl with its ideal value R), and when the 
DE-PO method is used KI^ hl ' L is shown instead in italics 
to indicate the choice of t = t (KI* kT ' L )^ The LA\ value 
is obtained from ( 11011 ), d90"l >, d9lT l and d93l) : technically it is 
only applicable when L — > oo but in practice its accuracy is 
good even when L = 100. The four losses that follow are with 
respect to the ideal MSE -P t *(m defined in Section Hl-DI and 
they are respectively 

1) the random-coding loss; 

2) the TTD loss corresponding to the MSE P t thr achieved 
by the TD, when L — > oo and it is able to synchronize 
with the TTD; 

3) the loss estimate ( l78l . in which we divide LA\ above 
by the actual average iteration count L to obtain A\; 

4) the actual shaping loss ( 1129b from simulation results. 
Several observations can be made: 

• The shaping loss decreases as the iteration count L 
increases, and can approach the random-coding loss and 
even be lower than the TTD loss (because the greedy 
decimator is better than the TD) when L is large. 

• At small L, adjusting the degree distribution according 
to L with the DE-PO method can improve performance 
significantly. 

26 In the iterative optimization process in Section IV-EI the of an 

optimized code can either be obtained from (68) as 1/s 

max, or more 

accurately, by making it the base code, rerunning DE on it, and computing 
7* hr from 0|>}. / c thr (but not /* hr - L ) i n Table |n| is computed with the latter 
method. 
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TABLE II 

Performance of LDGM Quantization Codes at n = 10 5 



Lo 


K 


R (b/s) 


<4 


Method 


KI W) 


LA; 


Losses: 10 log lc 




) (dB) 


L 


P t {R) 


P t thr 


CD 


Actual a 2 


10 2 


1 


0.4461 


12 


DE 
DE-PO 


0.4427 
0.4525 


3.44 
3.46 


0.0976 


0.1174 
N/A 


0.3479 
0.2921 


0.3241 
0.2721 


100 
LOO 


2 


0.9531 
0.4898 


11 

20 


DE 
DE-PO 
DE-PO 


0.9460 
0.9672 
0.5010 


3.44 
3.46 
3.47 


0.0010 
0.0010 
0.0369 


0.0437 

N/A 
N/A 


0.6441 
0.5282 
0.2306 


0.4949 
0.3962 
0.2676 


LOO 
LOO 
99 


10 3 


1 


0.4461 


12 


DE 
DE-PO 


0.4427 
0.4442 


3.44 
3.45 


u.uy /o 


0.1174 

N/A 


0.1466 
0.1377 


0.1537 
0.1443 


809 
815 


10 3 ' 


1 


0.4461 


12 


DE 
DE-PO 


0.4427 
0.4442 


3.44 
3.45 


0.0976 


0.1174 

N/A 


0.1402 
0.1318 


0.1426 
0.1400 


1036 
1023 




2 


0.9531 
0.6285 


11 

17 


DE 
DE-PO 


0.9460 
0.6256 


3.44 
3.49 


0.0010 
0.0130 


0.0437 
N/A 


0.1049 
0.0660 


0.0876 
0.0741 


1046 
1022 


10 4 


2 


0.9531 


11 


DE 


0.9460 


3.44 


0.0010 


0.0437 


0.0608 


0.0565 


3778 


10 4 ' 


1 

2 


0.4461 
0.9531 


12 
11 


DE 
DE 


0.4427 
0.9460 


3.44 
3.44 


0.0976 
0.0010 


0.1174 
0.0437 


0.1210 
0.0514 


0.1245 
0.0423 


6678 
8356 



• At a given L, the loss due to the finite L is larger 
for higher rates. Therefore, for 4-ary codes it is indeed 
helpful to small-!/ performance if a smaller R than that 
minimizing the random-coding loss is used. 

• For binary codes the random-coding loss becomes dom- 
inant at large L and limits the achievable shaping loss. 

• LAi is virtually code-independent. 

> The shaping loss can be well predicted by (178k it is 
not entirely accurate because the formula itself is only a 
heuristic, it is given for TD-with-recovery but here used 
with and also because it ignores the difference be- 

tween the throttled and unthrottled decimation algorithms 
and the loss due to finiteness of n. 

Through better degree distribution optimization methods, 
pacing of decimation, and choice of code rate, we have 
achieved in Table iHl better MSE performance than in [1] at 
the same complexity. In Table [Hi] we analyze the contribution 
of each individual improvement to the MSE performance of 
4-ary LDGM quantization codes. Starting with the method of 
[1] in the first row, which uses a slightly different code con- 
struction, EA-based optimization method and uniform pacing, 
we introduce one by one the following improvements in the 
subsequent five rows: 

1) The code construction in Fig. [9] optimized with EA; 

2) Optimized pace of decimation in dlOlt ; 

3) Pacing-aware code optimization in Section [VTEI 

4) The use of lower rates (0.4898 b/s for L = 10 2 and 
0.6285 b/s for Lq = 10 3 ') than the random-coding-loss- 
minimizing 0.9531 b/s rate used in previous rows; 

5) Quantized DE that avoids the erasure approximation 
used in previous rows. 

c?b = 11 is used in all but the first row, where the right- 
degree of each b-node is 2db = 12 [1]. The average actual 

27 As will be shown in Table llVl the greedy decimator is much less sensitive 
to code optimization and to the choice of t (or / c ) than TD with recovery, 
so its performance tends to be better than the estimate fl78t when _R"/* hr is 
significantly lower than its ideal value R. 



TABLE III 

Effects of Various Optimizations on Shaping Loss (dB) of 4-ary 
LDGM Codes 



Code 


L = 10 2 


L = 10 


3/ 


[1], unif. pace 


0.5420 (100) 


0.1022 (953) 


0.0991 


EA, unif. pace 


0.5530 (99) 


0.1037 (948) 


0.1002 


EA, opt. pace 


0.4594 (100) 


0.0875 (995) 


0.0872 


EA-PO 


0.3641 (100) 


0.0847 (988) 


0.0839 


EA-PO, low R 


0.2501 (99) 


0.0861 (960) 


0.0834 


DE-PO, low R 


0.2676 (99) 


0.0741 (1022) 


0.0756 



iteration counts L are shown in parentheses. Since L varies 
considerably when Lq = 10 3 ', for the purpose of a fairer 
comparison, we also show in italics the adjusted shaping losses 
approximately corresponding to L = 10 3 l 28 l 

We observe from Table [HI] that improvements |2), and |4|i 
are all important when Lo = 10 2 , but quantized DE (compared 
to EA) is only helpful when Lq = 10 3/ or larger, in which case 
it can decrease the shaping loss by about 0.01 dB. Technically, 
as is evident from Fig. [7] the codes optimized by EA usually 
have significantly suboptimal true monotonicity thresholds, but 
apparently the greedy decimator, unlike the TD with recovery 
on which our analysis is based, can avoid most of this loss. 
We will further investigate this below. 

B. Performance of the Typical Decimator 

Having discussed the greedy decimator, now we look at the 
typical decimator on which most of our theoretical analysis is 
based. Good performance from the TD requires the use of the 
recovery algorithm, which we have only implemented for the 
binary case as shown in Section [VI-Cl 29 l therefore only binary 

28 The adjustment uses the tradeoff 0.66 ■ 10 -4 dB per iteration between 
shaping loss and L. This tradeoff factor is obtained by reducing Lo from 
1000' to 935' for the last row in Table Hill the resulting shaping loss increases 
by 0.0040 dB to 0.0781 dB and L decreases by 60 to 962, and 0.0040/60 = 
0.66 ■ 10" 4 . 

29 A similar algorithm for the 2 A -ary case is conceivable but significantly 
more complex, since the desired distribution of some yj would depend on K 
incoming messages fj, cu ., rather than just one u z in the binary case. 
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TABLE IV 

Shaping Loss (dB) of Binary LDGM Codes with the Typical and 
the Greedy Decimators at n = 10 5 





Code 


Est. 


TD 


TD-R 


GD 


GD-R 


Lo 


= 10 2 ,DE-PO 


0.2894 


0.9128 


0.2923 


0.2721 


0.2291 


Lo 


= 10 3 ,DE-PO 


0.1330 


0.4678 


0.1479 


0.1443 


0.1296 


U> 


= 10 3 ,EA-PO 


0.2530 


0.4592 


0.1834 


0.1463 


0.1741 


He- 


0.4469, 0.3836) 


0.4871 


0.5888 


0.4968 


0.1526 


0.2649 



codes are considered here. 

The results are shown in Table [TV] for the two binary codes 
in TablelTIloptimized with method DE-PO at respectively Lq = 
10 2 and Lo = 10 3 . We additionally include the code optimized 
with EA-PO at the same R, db and Lq as an example of one 
with a poor monotonicity threshold: its erasure-approximated 
7 thr,L is 0.4469, but the true 7 c thr ' L is much lower at 0.3836 
due to the use of EA. The shaping losses of this code for J c 
at 0.4469 and at 0.3836 are shown respectively in the third 
and fourth row of Table [TV] TD and GD denote the typical 
and the greedy decimators, while TD-R and GD-R refer to the 
corresponding decimators with the recovery algorithm. The 
loss estimates are obtained via d78l ). with A\ computed from 
DE results without using the large-!/ approximation, so they 
differ slightly from the estimates in Table [II] 

We see that the typical decimator by itself performs rather 
poorly, but with recovery its MSE performance is at least close 
to that predicted by d78l . This can be observed more clearly 
from Fig. [TT] When TD is used without recovery, imperfect 
decimation causes the message densities to become far from 
consistent, in turn making the MI of the extrinsic v° messages 
far lower than the Ib.cxt predicted by DE, which is only 
accurate for consistent densities close to those encountered 
in the DE process. This, in effect, greatly increases the delta- 
area and thus the MSE. With the recovery algorithm, the Ib,ext 
from the quantizer matches much better (though not perfectly) 
with the DE result, showing that the message densities have 
been kept mostly consistent! 30 ! 

Table [IV] also shows that, for the first two well-optimized 
codes whose 7* hr ' L are close to ideal, TD-R and GD have 
similar performance, and GD-R works even better, suggesting 
that the recovery algorithm (whose complexity is a moderate 
0(n) per iteration) is also useful in practical quantizers. 
However, when using the code optimized with EA-PO and 
thus having low 7* hr,i , GD performs decidedly better than 
TD-R and even GD-R; apparently, GD is much less sensitive 
to the code or to the choice of I c . 

C. Finite-Length Effects 

Like LDPC codes with random edge assignment, LDGM 
quantization codes require large block sizes n to perform well. 

30 The loss due to imperfect recovery is not as large as that estimated by )78t 
though, if the area between the EBP curve and the TD (TD-R) curve in Fig. II II 
is used as A ; . The estimated losses are 0.3925 dB for TD-R and 1.4594 dB 
for TD, but the actual shaping losses are only respectively 0.2925 dB and 
0.8797 dB for the source sequence used. The likely reason for this discrepancy 
is that our method for estimating message Mis in Section IV-BI is accurate 
only for symmetric message densities, so it does not well characterize the 
deviations of the message densities from consistency (symmetry). 
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Fig. 11. Comparison of EBP and actual curves with TD and TD-R at 
Lq = 10 3 . The curve labeled "DE" is the actual curve computed from DE 
results as used in Section lVI-DI The curves labeled "TD" and "TD-R" are the 
trajectories of (/bj^b.oxt) followed by the actual quantizer when decimating 
a source sequence using the respective decimators, where 1^ is the fraction 
of decimated bi's and 7b,cxt is the average of 1 — H(iq). 

TABLE V 

Shaping Loss (dB) of Short 4-ary LDGM Codes at Lq = 10 3/ 



n 


LDGM (0.6285 b/s,DE-PO) 


2 11 -state TCQ 


SC bound 


100 000 


0.0741 


0.1335 


0.0005 


30 000 


0.0929 


0.1339 


0.0014 


10 000 


0.1297 


0.1362 


0.0036 


3 000 


0.2096 


0.1394 


0.0104 


1 000 


0.3225 


0.1515 


0.0263 


300 


0.5100 


0.1901 


0.0703 



As an example, we consider the R = 0.6285 b/s 4-ary code 
designed with the DE-PO method for L = 10 3 ' in Table HD 
and its small-rt shaping losses at this Lo are shown in Table IVl 
For comparison, we also include in Table [VI the shaping losses 
of TCQ, as well as the sphere-covering (SC) bound [12] 



G(A) er(n/2 + l) 2 A 



G* ^ " V n/2 + l ' (130) 
which is a lower bound of MSE at finite n, derived for exactly 
spherical Voronoi regions of A. 

We observe from Table [V] that LDGM quantization codes 
suffer significant loss when n is small. In particular, the loss 
in the sphere-covering bound scales as n~ 1 lnn, and TCQ's 
performance loss due to small n appears to scale similarly, but 
for LDGM-based quantizers this small-n loss decreases much 
more slowly as n increases. 

D. Comparison to TCQ 

For comparison purposes, we show the MSE performance 
of TCQ with long block length n = 10 5 in Table ED The 
codes have the same structure as the rh = 1 case in [32] and 
have 2 V states. In our terminology, they are thus 4-ary codes 
of rate R = (1 + v/n)h/s including tail bits. To study the 
performance trends of TCQ codes with more states than those 
found in the literature, we optimize the generator polynomials 
ourselves via random search. The resulting shaping losses 
agree with the results in [4, Table IV] and [9, Table I] available 
for v < 11, suggesting that the random search method, though 
not exhaustive, already gives near-optimal TCQ codes. 

The results in Table [Vl] confirm that TCQ can also achieve 
near-zero shaping losses, but the loss decreases only slightly 
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TABLE VI 

Shaping Loss (dB) of 2"-State TCQ at n = 10 5 



V 


loss (dB) 


;/ 


loss (dB) 


V 


loss (dB) 


v 


loss (dB) 


2 


0.5371 


6 


0.2664 


10 


0.1484 


14 


0.0951 


3 


0.4464 


7 


0.2321 


11 


0.1335 


15 


0.0853 


4 


0.3781 


8 


0.1921 


12 


0.1155 


16 


0.0784 


5 


0.3183 


9 


0.1757 


13 


0.1033 


17 


0.0705 



faster than 1/V, therefore the number of states 2 V (and thus 
the time and memory complexity) increases exponentially as 
the loss approaches zero. For example, the 0.2676 dB loss of 
LDGM-based quantization at L « 10 2 can be achieved by 
TCQ with 2 6 to 2 7 states, but the 0.0741 dB loss at L w 10 3 
would require an astronomical 2 16 to 2 17 states to achieve 
with TCQ, so the proposed LDGM-based quantizer is much 
better than TCQ at achieving near-zero shaping losses when 
n is large^H However, TCQ remains advantageous for small 
n as we have shown in Table [V] 

IX. Complexity Analysis 
A. Computational Complexity 

We now analyze the time complexity, per block of n 
source symbols, of a serial implementation of the proposed 
quantization algorithm. Dequantization obviously has much 
lower complexity and will therefore not be discussed. 

The time complexity of the belief propagation part in the 
binary case (Fig. [3i is clearly linear in the number of edges 
in the factor graphO i-e. 0(nRdb) per iteration. In the 2 K - 
ary algorithm in Fig. \W\ BP at b- and c-nodes also has this 
complexity, while at each Uj the K /i^ fc 's take 0(K2 K ) time 
to compute with (II 1 Up 3 ] therefore the total complexity of BP 
is 0(n(Rdt, + K2 K )) per iteration, whose K — 1 version is 
also applicable to the binary case. 

Within the decimation part, only the greedy decimator's 
selection of the most certain bits to decimate may have 
higher complexity. In a straightforward implementation of 
the GD in Fig. [3] the most certain &j's are selected one by 
one until either <5 ma x or A/™ 11 is reached. This incremental 
selection problem can be solved with partial quicksort; if 
rib, ; bits end up being decimated in iteration I, the selection 
has complexity 0(n^ + n^jlogn^j) in that iteration. Since 
rib = Yli n h,u this complexity averaged over L iterations 
is at most 0(rib(l + l ° s " b )) per iteration, which usually 
reduces to O(rib) since generally log rib <C L. For even 
larger rib, we note that the quantization algorithm is unaffected 
even if the decimated bits in an iteration are selected non- 
incrementally and unsorted among themselves by certainty, 
which has only O(rib) time complexity per iteration using 



3 'One may note that the LDGM code and the TCQ code have different 
rates R. However, in shaping and DPC applications, the rate of the shaping 
code does not matter much as long as the desired shaping loss is achieved, 
therefore it should be fair to compare TCQ and LDGM at their respective 
"natural" rates. 

32 Note that the computation at each b- or c-node with degree d requires 
O(d) time per iteration using the forward-backward algorithm (similar to 
BCJR), not 0{d 2 ) as is required by the naive implementation. 

33 Again, the forward-backward algorithm is responsible for the reduction 
in complexity from 0(K 2 2 K ) to 0{K2 K ). 



partial quicksort, and the limits <5 max and A/™ 11 can still be 
enforced by appropriate summing within each partition at the 
same complexity. This method is probably slower in practice, 
but it shows that O(rtb) selection complexity per iteration is 
possible in principle even when log rib ^ L. 

We thus conclude that our quantization algorithm has 
complexity 0(n(Rdb + K2 K )) per block per iteration, or 
0(L(Rdb + K2 K )) per symbol summed over all iterations. In 
practice, the most certain bits to decimate can also be selected 
with a priority queue or even by a full sort in every iteration; 
the higher complexities of these methods do not actually slow 
down the overall algorithm much. 

B. The Loss-Complexity Tradeoff 

The asymptotic loss-complexity tradeoff of LDGM quan- 
tizers can now be analyzed heuristically. For simplicity we 
assume K to be a constant, and the time complexity of the 
quantizer per symbol can then be simplified to 0(L ■ Rd^). We 
analyze the extra loss, denoted by 1/k, compared to the 2 K - 
ary random-coding loss, and n is assumed to be large enough 
that the small-n loss does not dominate this extra loss. 

Now the extra loss 1 / n consists mainly of two parts, namely 
the monotonicity threshold loss due to the gap between KI* hr 
and its ideal value R, and the delta-area loss due to the 
finiteness of the iteration count L. We have observed in TableU 
that the monotonicity threshold loss diminishes exponentially 
fast with the increase of db for BEQ, and this is apparently true 
for MSE quantization as well; more precisely, the loss appears 
to be diminishing exponentially with the average c-degree 
Rdb/K, therefore in order to reduce this loss to 0(1/ k), Rd^ 
must be on the order of log k. As for the delta-area loss, (|78| > 
suggests that it is proportional to the delta-area A- u and since 
LA[ is almost a code-independent constant in our simulations 
when I c < /* hl (' L ^ Ai is in turn inversely proportional 
to the iteration count L, therefore L on the order of k is 
necessary to make this loss 0(1/ k). The overall complexity 
per symbol necessary for 0(1/ n) extra loss is thus O(KlogK) 
according to these heuristic arguments. Note that this is similar 
to previous results and conjectures on the tradeoff between 
gap-to-capacity and complexity for LDPC channel codes; see 
[34] and references therein. 

In comparison, the complexity needed to achieve 1/k loss 
with TCQ appears from Table |VT] to be exponential in n, and 
current achievability results in [35] also achieves this 0(e K ) 
complexity only. It thus seem unlikely that a similar 0(k log k) 
complexity can be achieved with TCQ. 

C. Strengths of LDGM Quantizers versus TCQ 

From the numerical results and heuristical analysis above, 
we conclude that the proposed LDGM quantizers are superior 
to TCQ in terms of the loss-complexity tradeoff, when the 
block length n is large and near-zero shaping losses are 
desired. On the other hand, TCQ does perform better for 
n smaller than 10 3 -10 4 , and a simple 4-state TCQ may 
also suffice in undemanding applications where its 0.537 1-dB 
shaping loss is acceptable. 
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Till now we have talked about the complexity at the encoder 
(quantization) side only. In shaping applications, particularly 
DPC, the advantage of LDGM quantizers is more evident 
at the decoder side, which according to © must usually 
iteratively separate the superposition of a channel codeword u 
and a quantizer codeword a [9]. When TCQ is used and when 
the operating SNR is close to threshold, the BCJR algorithm 
must be run in full many times on the trellis, making the 
decoder-side complexity much higher than the encoder side. 
When LDGM-based quantizers are used, on the other hand, 
the inner iterations of the channel decoder (usually LDPC) and 
those on the LDGM quantization code can be interleaved, and 
in practice the total complexity is usually no higher than at the 
encoder side, both comparable to an ordinary LDPC decoder. 

It is also worth noting that increasing the number of states in 
TCQ increases both time and memory complexity, whereas a 
larger Lq in the LDGM quantizer increases only the encoder- 
side time complexity, not the memory complexity. This is, 
however, partially offset by the LDGM quantizer's need of 
larger block lengths. 

X. Conclusion 

In this paper we have designed LDGM-based codes and 
corresponding iterative quantization algorithms for the MSE 
quantization problem of K™. The optimization of the degree 
distributions is formulated, via the introduction of the TTD, 
as the maximization of a monotonicity threshold that can be 
determined using density evolution methods and optimized by 
linear programming. The finite number of iterations L is then 
accounted for by optimizing the pace of decimation and using 
a modified criterion in degree distribution optimization. 

As shown by the simulation results, the proposed quan- 
tizers can achieve much lower shaping losses than TCQ at 
similar complexity. The methods employed in the analysis 
of the decimation process, in particular the typical decimator 
synchronized to the TTD, may also prove useful elsewhere. 

The proposed LDGM-based quantizers are useful in lossy 
source coding and shaping, but in practice their good perfor- 
mance is most important in dirty-paper coding in the low- 
SNR regime. According to our preliminary investigations, a 
superimposed structure similar to [6], [7], [9] can be used 
directly, where the transmitted signal has the form @, con- 
sisting of an LDPC codeword (usually modulated into a 4- 
PAM or higher signal) containing the desired information, 
pre-subtracted known interference, plus a codeword from 
the LDGM quantizer to minimize the overall transmission 
power. The design of the LDPC code, such that the LDPC 
and LDGM parts can be correctly separated at the receiver, 
appears to be straightforward although more work is necessary 
in the details. The scheme is then expected to give better 
performance than existing TCQ-based schemes at the same 
level of computational complexity. Alternatively, in [16] a 
"nested" structure for the binary symmetric Gelfand-Pinsker 
problem has been proposed, in which the codewords of an 
LDGM quantization code are divided into cosets according to 
linear equations on b and the known interference is quantized 
into a codeword chosen from one coset that corresponds to the 



information to be conveyed. In [36], a similar construction is 
proposed for the binary erasure case. It is not difficult to extend 
this scheme to DPC on Gaussian channels, and code design, 
though much more complex, is still possible. However, as in 
BEQ, our BP-based quantizer will generally leave some hard 
constraints related the transmitted information unsatisfied, and 
the necessary overhead to correct such errors may make such 
nested codes less attractive than the superpositional structure 
above. More investigation is necessary in this aspect. 

On the quantizer itself, the currently achieved long-block 
shaping losses are already quite good, and we have been 
able to account for the losses, through theoretical analysis 
or heuristic arguments, with the random-coding loss, the 
nonideality of the monotonicity threshold, the delta-area loss 
due to finite iteration count L, and the loss due to finite block 
length n. In future work, it would be useful to rigorously 
investigate the correctness of these heuristics. Our analysis is 
also limited to the typical decimator with recovery; as we have 
shown in Section [VIII-BI the greedy decimator used in practice 
can have significantly different performance when the code is 
not well optimized in terms of 7* hr or when J c is far from 
i™ r , therefore an analysis of the GD would be interesting. 

Further improvement in MSE performance may come from 
appropriate use of the recovery algorithm, a better optimized 
strategy for controlling the decimation process (see Sec- 
tion lVI-Al . and a more refined degree distribution optimization 
method based on the results of quantized DE. In addition, 
there is still plenty of room for improvement in small-n perfor- 
mance. We have found that better edge assignment algorithms, 
such as progressive edge growth (PEG) [37], could noticeably 
improve LDGM quantizers' shaping losses for small n, though 
the improvement is not large, partly due to the change in EXIT 
curves caused by such algorithms. Larger gains may result 
from applying the PEG method more carefully, or from the 
use of non-binary or generalized LDGM codes, which may be 
viewed as a combination of TCQ and LDGM techniques. 
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